Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add temporal FOREIGN KEY contraints

  2. Remove obsolete executor cleanup code

  1. Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-25T12:05:09Z

    Attached is a PoC patch to implement "Row pattern recognition" (RPR)
    in SQL:2016 (I know SQL:2023 is already out, but I don't have access
    to it). Actually SQL:2016 defines RPR in two places[1]:
    
        Feature R010, “Row pattern recognition: FROM clause”
        Feature R020, “Row pattern recognition: WINDOW clause”
    
    The patch includes partial support for R020 part.
    
    - What is RPR?
    
    RPR provides a way to search series of data using regular expression
    patterns. Suppose you have a stock database.
    
    CREATE TABLE stock (
           company TEXT,
           tdate DATE,
           price BIGINT);
    
    You want to find a "V-shaped" pattern: i.e. price goes down for 1 or
    more days, then goes up for 1 or more days. If we try to implement the
    query in PostgreSQL, it could be quite complex and inefficient.
    
    RPR provides convenient way to implement the query.
    
    SELECT company, tdate, price, rpr(price) OVER w FROM stock
     WINDOW w AS (
     PARTITION BY company
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
     PATTERN (START DOWN+ UP+)
     DEFINE
      START AS TRUE,
      UP AS price > PREV(price),
      DOWN AS price < PREV(price)
    );
    
    "PATTERN" and "DEFINE" are the key clauses of RPR. DEFINE defines 3
    "row pattern variables" namely START, UP and DOWN. They are associated
    with logical expressions namely "TRUE", "price > PREV(price)", and
    "price < PREV(price)". Note that "PREV" function returns price column
    in the previous row. So, UP is true if price is higher than previous
    day. On the other hand, DOWN is true if price is lower than previous
    day.  PATTERN uses the row pattern variables to create a necessary
    pattern.  In this case, the first row is always match because START is
    always true, and second or more rows match with "UP" ('+' is a regular
    expression representing one or more), and subsequent rows match with
    "DOWN".
    
    Here is the sample output.
    
     company  |   tdate    | price | rpr  
    ----------+------------+-------+------
     company1 | 2023-07-01 |   100 |     
     company1 | 2023-07-02 |   200 |  200 -- 200->150->140->150
     company1 | 2023-07-03 |   150 |  150 -- 150->140->150
     company1 | 2023-07-04 |   140 |     
     company1 | 2023-07-05 |   150 |  150 -- 150->90->110->130
     company1 | 2023-07-06 |    90 |     
     company1 | 2023-07-07 |   110 |     
     company1 | 2023-07-08 |   130 |     
     company1 | 2023-07-09 |   120 |     
     company1 | 2023-07-10 |   130 |     
    
    rpr shows the first row if all the patterns are satisfied. In the
    example above 200, 150, 150 are the cases.  Other rows are shown as
    NULL. For example, on 2023-07-02 price starts with 200, then goes down
    to 150 then 140 but goes up 150 next day.
    
    As far as I know, only Oracle implements RPR (only R010. R020 is not
    implemented) among OSS/commercial RDBMSs. There are a few DWH software
    having RPR. According to [2] they are Snowflake and MS Stream
    Analytics. It seems Trino is another one[3].
    
    - Note about the patch
    
    The current implementation is not only a subset of the standard, but
    is different from it in some places. The example above is written as
    follows according to the standard:
    
    SELECT company, tdate, startprice OVER w FROM stock
     WINDOW w AS (
     PARTITION BY company
     MEASURES
      START.price AS startprice
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
     PATTERN (START DOWN+ UP+)
     DEFINE
      START AS TRUE,
      UP AS UP.price > PREV(UP.price),
      DOWN AS DOWN.price < PREV(DOWN.price)
    );
    
    Notice that rpr(price) is written as START.price and startprice in the
    standard. MEASURES defines variable names used in the target list used
    with "OVER window". As OVER only allows functions in PostgreSQL, I had
    to make up a window function "rpr" which performs the row pattern
    recognition task.  I was not able to find a way to implement
    expressions like START.price (START is not a table alias). Any
    suggestion is greatly appreciated.
    
    The differences from the standard include:
    
    o MEASURES is not supported
    o SUBSET is not supported
    o FIRST, LAST, CLASSIFIER are not supported
    o PREV/NEXT in the standard accept more complex arguments
    o Regular expressions other than "+" are not supported
    o Only AFTER MATCH SKIP TO NEXT ROW is supported (if AFTER MATCH is
      not specified, AFTER MATCH SKIP TO NEXT ROW is assumed. In the
      standard AFTER MATCH SKIP PAST LAST ROW is assumed in this case). I
      have a plan to implement AFTER MATCH SKIP PAST LAST ROW though.
    o INITIAL or SEEK are not supported ((behaves as if INITIAL is specified)
    o Aggregate functions associated with window clause using RPR do not respect RPR
    
    It seems RPR in the standard is quite complex. I think we can start
    with a small subset of RPR then we could gradually enhance the
    implementation.
    
    Comments and suggestions are welcome.
    
    [1] https://sqlperformance.com/2019/04/t-sql-queries/row-pattern-recognition-in-sql
    [2] https://link.springer.com/article/10.1007/s13222-022-00404-3
    [3] https://trino.io/docs/current/sql/pattern-recognition-in-window.html
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  2. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-06-25T21:08:35Z

    On 6/25/23 14:05, Tatsuo Ishii wrote:
    > Attached is a PoC patch to implement "Row pattern recognition" (RPR)
    > in SQL:2016 (I know SQL:2023 is already out, but I don't have access
    > to it). Actually SQL:2016 defines RPR in two places[1]:
    > 
    >      Feature R010, “Row pattern recognition: FROM clause”
    >      Feature R020, “Row pattern recognition: WINDOW clause”
    > 
    > The patch includes partial support for R020 part.
    
    I have been dreaming of and lobbying for someone to pick up this 
    feature.  I will be sure to review it from a standards perspective and 
    will try my best to help with the technical aspect, but I am not sure to 
    have the qualifications for that.
    
    THANK YOU!
    
     > (I know SQL:2023 is already out, but I don't have access to it)
    
    If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead 
    of just its technical specification.
    
    https://www.iso.org/standard/78936.html
    
    > - What is RPR?
    > 
    > RPR provides a way to search series of data using regular expression
    > patterns. Suppose you have a stock database.
    > 
    > CREATE TABLE stock (
    >         company TEXT,
    >         tdate DATE,
    >         price BIGINT);
    > 
    > You want to find a "V-shaped" pattern: i.e. price goes down for 1 or
    > more days, then goes up for 1 or more days. If we try to implement the
    > query in PostgreSQL, it could be quite complex and inefficient.
    > 
    > RPR provides convenient way to implement the query.
    > 
    > SELECT company, tdate, price, rpr(price) OVER w FROM stock
    >   WINDOW w AS (
    >   PARTITION BY company
    >   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >   PATTERN (START DOWN+ UP+)
    >   DEFINE
    >    START AS TRUE,
    >    UP AS price > PREV(price),
    >    DOWN AS price < PREV(price)
    > );
    > 
    > "PATTERN" and "DEFINE" are the key clauses of RPR. DEFINE defines 3
    > "row pattern variables" namely START, UP and DOWN. They are associated
    > with logical expressions namely "TRUE", "price > PREV(price)", and
    > "price < PREV(price)". Note that "PREV" function returns price column
    > in the previous row. So, UP is true if price is higher than previous
    > day. On the other hand, DOWN is true if price is lower than previous
    > day.  PATTERN uses the row pattern variables to create a necessary
    > pattern.  In this case, the first row is always match because START is
    > always true, and second or more rows match with "UP" ('+' is a regular
    > expression representing one or more), and subsequent rows match with
    > "DOWN".
    > 
    > Here is the sample output.
    > 
    >   company  |   tdate    | price | rpr
    > ----------+------------+-------+------
    >   company1 | 2023-07-01 |   100 |
    >   company1 | 2023-07-02 |   200 |  200 -- 200->150->140->150
    >   company1 | 2023-07-03 |   150 |  150 -- 150->140->150
    >   company1 | 2023-07-04 |   140 |
    >   company1 | 2023-07-05 |   150 |  150 -- 150->90->110->130
    >   company1 | 2023-07-06 |    90 |
    >   company1 | 2023-07-07 |   110 |
    >   company1 | 2023-07-08 |   130 |
    >   company1 | 2023-07-09 |   120 |
    >   company1 | 2023-07-10 |   130 |
    > 
    > rpr shows the first row if all the patterns are satisfied. In the
    > example above 200, 150, 150 are the cases.  Other rows are shown as
    > NULL. For example, on 2023-07-02 price starts with 200, then goes down
    > to 150 then 140 but goes up 150 next day.
    
    I don't understand this.  RPR in a window specification limits the 
    window to the matched rows, so this looks like your rpr() function is 
    just the regular first_value() window function that we already have?
    
    > As far as I know, only Oracle implements RPR (only R010. R020 is not
    > implemented) among OSS/commercial RDBMSs. There are a few DWH software
    > having RPR. According to [2] they are Snowflake and MS Stream
    > Analytics. It seems Trino is another one[3].
    
    I thought DuckDB had it already, but it looks like I was wrong.
    
    > - Note about the patch
    > 
    > The current implementation is not only a subset of the standard, but
    > is different from it in some places. The example above is written as
    > follows according to the standard:
    > 
    > SELECT company, tdate, startprice OVER w FROM stock
    >   WINDOW w AS (
    >   PARTITION BY company
    >   MEASURES
    >    START.price AS startprice
    >   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >   PATTERN (START DOWN+ UP+)
    >   DEFINE
    >    START AS TRUE,
    >    UP AS UP.price > PREV(UP.price),
    >    DOWN AS DOWN.price < PREV(DOWN.price)
    > );
    > 
    > Notice that rpr(price) is written as START.price and startprice in the
    > standard. MEASURES defines variable names used in the target list used
    > with "OVER window". As OVER only allows functions in PostgreSQL, I had
    > to make up a window function "rpr" which performs the row pattern
    > recognition task.  I was not able to find a way to implement
    > expressions like START.price (START is not a table alias). Any
    > suggestion is greatly appreciated.
    
    As in your example, you cannot have START.price outside of the window 
    specification; it can only go in the MEASURES clause.  Only startprice 
    is allowed outside and it gets its qualification from the OVER.  Using 
    w.startprice might have been better but that would require window names 
    to be in the same namespace as range tables.
    
    This currently works in Postgres:
    
       SELECT RANK() OVER w
       FROM (VALUES (1)) AS w (x)
       WINDOW w AS (ORDER BY w.x);
    
    > The differences from the standard include:
    > 
    > o MEASURES is not supported
     > o FIRST, LAST, CLASSIFIER are not supported
     > o PREV/NEXT in the standard accept more complex arguments
     > o INITIAL or SEEK are not supported ((behaves as if INITIAL is specified)
    
    Okay, for now.
    
    > o SUBSET is not supported
    
    Is this because you haven't done it yet, or because you ran into 
    problems trying to do it?
    
    > o Regular expressions other than "+" are not supported
    
    This is what I had a hard time imagining how to do while thinking about 
    it.  The grammar is so different here and we allow many more operators 
    (like "?" which is also the standard parameter symbol).  People more 
    expert than me will have to help here.
    
    > o Only AFTER MATCH SKIP TO NEXT ROW is supported (if AFTER MATCH is
    >    not specified, AFTER MATCH SKIP TO NEXT ROW is assumed. In the
    >    standard AFTER MATCH SKIP PAST LAST ROW is assumed in this case). I
    >    have a plan to implement AFTER MATCH SKIP PAST LAST ROW though.
    
    In this case, we should require the user to specify AFTER MATCH SKIP TO 
    NEXT ROW so that behavior doesn't change when we implement the standard 
    default.  (Your patch might do this already.)
    
    > o Aggregate functions associated with window clause using RPR do not respect RPR
    
    I do not understand what this means.
    
    > It seems RPR in the standard is quite complex. I think we can start
    > with a small subset of RPR then we could gradually enhance the
    > implementation.
    
    I have no problem with that as long as we don't paint ourselves into a 
    corner.
    
    > Comments and suggestions are welcome.
    
    I have not looked at the patch yet, but is the reason for doing R020 
    before R010 because you haven't done the MEASURES clause yet?
    
    In any case, I will be watching this with a close eye, and I am eager to 
    help in any way I can.
    -- 
    Vik Fearing
    
    
    
    
    
  3. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-26T01:05:20Z

    > I have been dreaming of and lobbying for someone to pick up this
    > feature.  I will be sure to review it from a standards perspective and
    > will try my best to help with the technical aspect, but I am not sure
    > to have the qualifications for that.
    > 
    > THANK YOU!
    
    Thank you for looking into my proposal.
    
    >> (I know SQL:2023 is already out, but I don't have access to it)
    > 
    > If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead
    > of just its technical specification.
    > 
    > https://www.iso.org/standard/78936.html
    
    Thanks for the info.
    
    > I don't understand this.  RPR in a window specification limits the
    > window to the matched rows, so this looks like your rpr() function is
    > just the regular first_value() window function that we already have?
    
    No, rpr() is different from first_value(). rpr() returns the argument
    value at the first row in a frame only when matched rows found. On the
    other hand first_value() returns the argument value at the first row
    in a frame unconditionally.
    
    company  |   tdate    | price | rpr  | first_value 
    ----------+------------+-------+------+-------------
     company1 | 2023-07-01 |   100 |      |         100
     company1 | 2023-07-02 |   200 |  200 |         200
     company1 | 2023-07-03 |   150 |  150 |         150
     company1 | 2023-07-04 |   140 |      |         140
     company1 | 2023-07-05 |   150 |  150 |         150
     company1 | 2023-07-06 |    90 |      |          90
     company1 | 2023-07-07 |   110 |      |         110
     company1 | 2023-07-08 |   130 |      |         130
     company1 | 2023-07-09 |   120 |      |         120
     company1 | 2023-07-10 |   130 |      |         130
    
    For example, a frame starting with (tdate = 2023-07-02, price = 200)
    consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
    thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
    CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
    with (tdate = 2023-07-03, price = 150). This frame satisfies the
    pattern too (price = 150, 140, 150), and rpr retus 150... and so on.
    
    > As in your example, you cannot have START.price outside of the window
    > specification; it can only go in the MEASURES clause.  Only startprice
    > is allowed outside and it gets its qualification from the OVER.  Using
    > w.startprice might have been better but that would require window
    > names to be in the same namespace as range tables.
    > 
    > This currently works in Postgres:
    > 
    >   SELECT RANK() OVER w
    >   FROM (VALUES (1)) AS w (x)
    >   WINDOW w AS (ORDER BY w.x);
    
    Interesting.
    
    >> o SUBSET is not supported
    > 
    > Is this because you haven't done it yet, or because you ran into
    > problems trying to do it?
    
    Because it seems SUBSET is not useful without MEASURES support. Thus
    my plan is, firstly implement MEASURES, then SUBSET. What do you
    think?
    
    >> o Regular expressions other than "+" are not supported
    > 
    > This is what I had a hard time imagining how to do while thinking
    > about it.  The grammar is so different here and we allow many more
    > operators (like "?" which is also the standard parameter symbol).
    > People more expert than me will have to help here.
    
    Yes, that is a problem.
    
    > In this case, we should require the user to specify AFTER MATCH SKIP
    > TO NEXT ROW so that behavior doesn't change when we implement the
    > standard default.  (Your patch might do this already.)
    
    Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
    patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.
    
    >> o Aggregate functions associated with window clause using RPR do not
    >> respect RPR
    > 
    > I do not understand what this means.
    
    Ok, let me explain. See example below. In my understanding "count"
    should retun the number of rows in a frame restriced by the match
    condition. For example at the first line (2023-07-01 | 100) count
    returns 10. I think this should be 0 because the "restriced" frame
    starting at the line contains no matched row. On the other hand the
    (restricted) frame starting at second line (2023-07-02 | 200) contains
    4 rows, thus count should return 4, instead of 9.
    
    SELECT company, tdate, price, rpr(price) OVER w, count(*) OVER w FROM stock
     WINDOW w AS (
     PARTITION BY company
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
     PATTERN (START DOWN+ UP+)
     DEFINE
      START AS TRUE,
      UP AS price > PREV(price),
      DOWN AS price < PREV(price)
    );
    
    company  |   tdate    | price | rpr  | count 
    ----------+------------+-------+------+-------
     company1 | 2023-07-01 |   100 |      |    10
     company1 | 2023-07-02 |   200 |  200 |     9
     company1 | 2023-07-03 |   150 |  150 |     8
     company1 | 2023-07-04 |   140 |      |     7
     company1 | 2023-07-05 |   150 |  150 |     6
     company1 | 2023-07-06 |    90 |      |     5
     company1 | 2023-07-07 |   110 |      |     4
     company1 | 2023-07-08 |   130 |      |     3
     company1 | 2023-07-09 |   120 |      |     2
     company1 | 2023-07-10 |   130 |      |     1
    
    >> It seems RPR in the standard is quite complex. I think we can start
    >> with a small subset of RPR then we could gradually enhance the
    >> implementation.
    > 
    > I have no problem with that as long as we don't paint ourselves into a
    > corner.
    
    Totally agreed.
    
    >> Comments and suggestions are welcome.
    > 
    > I have not looked at the patch yet, but is the reason for doing R020
    > before R010 because you haven't done the MEASURES clause yet?
    
    One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
    harder for me because modifying main SELECT clause could be a hard
    work. Another reason is, I had no idea how to implement PREV/NEXT in
    other than in WINDOW clause. Other people might feel differently
    though.
    
    > In any case, I will be watching this with a close eye, and I am eager
    > to help in any way I can.
    
    Thank you! I am looking forward to comments on my patch.  Also any
    idea how to implement MEASURES clause is welcome.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  4. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-26T08:45:07Z

    >> In this case, we should require the user to specify AFTER MATCH SKIP
    >> TO NEXT ROW so that behavior doesn't change when we implement the
    >> standard default.  (Your patch might do this already.)
    > 
    > Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
    > patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.
    
    Attached is the v2 patch to add support for AFTER MATCH SKIP PAST LAST
    ROW and AFTER MATCH SKIP PAST LAST ROW. The default is AFTER MATCH
    SKIP PAST LAST ROW as the standard default. Here are some examples to
    demonstrate how those clauses affect the query result.
    
    SELECT i, rpr(i) OVER w
      FROM (VALUES (1), (2), (3), (4)) AS v (i)
      WINDOW w AS (
       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
       AFTER MATCH SKIP PAST LAST ROW
       PATTERN (A B)
       DEFINE
        A AS i <= 2,
        B AS i <= 3
    );
     i | rpr 
    ---+-----
     1 |   1
     2 |    
     3 |    
     4 |    
    (4 rows)
    
    In this example rpr starts from i = 1 and find that row i = 1
    satisfies A, and row i = 2 satisfies B. Then rpr moves to row i = 3
    and find that it does not satisfy A, thus the result is NULL. Same
    thing can be said to row i = 4.
    
    SELECT i, rpr(i) OVER w
      FROM (VALUES (1), (2), (3), (4)) AS v (i)
      WINDOW w AS (
       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
       AFTER MATCH SKIP TO NEXT ROW
       PATTERN (A B)
       DEFINE
        A AS i <= 2,
        B AS i <= 3
    );
     i | rpr 
    ---+-----
     1 |   1
     2 |   2
     3 |    
     4 |    
    (4 rows)
    
    In this example rpr starts from i = 1 and find that row i = 1
    satisfies A, and row i = 2 satisfies B (same as above). Then rpr moves
    to row i = 2, rather than 3 because AFTER MATCH SKIP TO NEXT ROW is
    specified.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  5. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-06-26T22:38:20Z

    On 6/26/23 03:05, Tatsuo Ishii wrote:
    >> I don't understand this.  RPR in a window specification limits the
    >> window to the matched rows, so this looks like your rpr() function is
    >> just the regular first_value() window function that we already have?
    > 
    > No, rpr() is different from first_value(). rpr() returns the argument
    > value at the first row in a frame only when matched rows found. On the
    > other hand first_value() returns the argument value at the first row
    > in a frame unconditionally.
    > 
    > company  |   tdate    | price | rpr  | first_value
    > ----------+------------+-------+------+-------------
    >   company1 | 2023-07-01 |   100 |      |         100
    >   company1 | 2023-07-02 |   200 |  200 |         200
    >   company1 | 2023-07-03 |   150 |  150 |         150
    >   company1 | 2023-07-04 |   140 |      |         140
    >   company1 | 2023-07-05 |   150 |  150 |         150
    >   company1 | 2023-07-06 |    90 |      |          90
    >   company1 | 2023-07-07 |   110 |      |         110
    >   company1 | 2023-07-08 |   130 |      |         130
    >   company1 | 2023-07-09 |   120 |      |         120
    >   company1 | 2023-07-10 |   130 |      |         130
    > 
    > For example, a frame starting with (tdate = 2023-07-02, price = 200)
    > consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
    > thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
    > CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
    > with (tdate = 2023-07-03, price = 150). This frame satisfies the
    > pattern too (price = 150, 140, 150), and rpr retus 150... and so on.
    
    
    Okay, I see the problem now, and why you need the rpr() function.
    
    You are doing this as something that happens over a window frame, but it 
    is actually something that *reduces* the window frame.  The pattern 
    matching needs to be done when the frame is calculated and not when any 
    particular function is applied over it.
    
    This query (with all the defaults made explicit):
    
    SELECT s.company, s.tdate, s.price,
            FIRST_VALUE(s.tdate) OVER w,
            LAST_VALUE(s.tdate) OVER w,
            lowest OVER w
    FROM stock AS s
    WINDOW w AS (
       PARTITION BY s.company
       ORDER BY s.tdate
       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
       EXCLUDE NO OTHERS
       MEASURES
         LAST(DOWN) AS lowest
       AFTER MATCH SKIP PAST LAST ROW
       INITIAL PATTERN (START DOWN+ UP+)
       DEFINE
         START AS TRUE,
         UP AS price > PREV(price),
         DOWN AS price < PREV(price)
    );
    
    Should produce this result:
    
      company  |   tdate    | price | first_value | last_value | lowest
    ----------+------------+-------+-------------+------------+--------
      company1 | 07-01-2023 |   100 |             |            |
      company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |    140
      company1 | 07-03-2023 |   150 |             |            |
      company1 | 07-04-2023 |   140 |             |            |
      company1 | 07-05-2023 |   150 |             |            |
      company1 | 07-06-2023 |    90 |             |            |
      company1 | 07-07-2023 |   110 |             |            |
      company1 | 07-08-2023 |   130 | 07-05-2023  | 07-05-2023 |    120
      company1 | 07-09-2023 |   120 |             |            |
      company1 | 07-10-2023 |   130 |             |            |
    (10 rows)
    
    Or if we switch to AFTER MATCH SKIP TO NEXT ROW, then we get:
    
      company  |   tdate    | price | first_value | last_value | lowest
    ----------+------------+-------+-------------+------------+--------
      company1 | 07-01-2023 |   100 |             |            |
      company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |    140
      company1 | 07-03-2023 |   150 | 07-03-2023  | 07-05-2023 |    140
      company1 | 07-04-2023 |   140 |             |            |
      company1 | 07-05-2023 |   150 | 07-05-2023  | 07-08-2023 |     90
      company1 | 07-06-2023 |    90 |             |            |
      company1 | 07-07-2023 |   110 |             |            |
      company1 | 07-08-2023 |   130 | 07-08-2023  | 07-10-2023 |    120
      company1 | 07-09-2023 |   120 |             |            |
      company1 | 07-10-2023 |   130 |             |            |
    (10 rows)
    
    And then if we change INITIAL to SEEK:
    
      company  |   tdate    | price | first_value | last_value | lowest
    ----------+------------+-------+-------------+------------+--------
      company1 | 07-01-2023 |   100 | 07-02-2023  | 07-05-2023 |    140
      company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |    140
      company1 | 07-03-2023 |   150 | 07-03-2023  | 07-05-2023 |    140
      company1 | 07-04-2023 |   140 | 07-05-2023  | 07-08-2023 |     90
      company1 | 07-05-2023 |   150 | 07-05-2023  | 07-08-2023 |     90
      company1 | 07-06-2023 |    90 | 07-08-2023  | 07-10-2023 |    120
      company1 | 07-07-2023 |   110 | 07-08-2023  | 07-10-2023 |    120
      company1 | 07-08-2023 |   130 | 07-08-2023  | 07-10-2023 |    120
      company1 | 07-09-2023 |   120 |             |            |
      company1 | 07-10-2023 |   130 |             |            |
    (10 rows)
    
    Since the pattern recognition is part of the frame, the window 
    aggregates should Just Work.
    
    
    >>> o SUBSET is not supported
    >>
    >> Is this because you haven't done it yet, or because you ran into
    >> problems trying to do it?
    > 
    > Because it seems SUBSET is not useful without MEASURES support. Thus
    > my plan is, firstly implement MEASURES, then SUBSET. What do you
    > think?
    
    
    SUBSET elements can be used in DEFINE clauses, but I do not think this 
    is important compared to other features.
    
    
    >>> Comments and suggestions are welcome.
    >>
    >> I have not looked at the patch yet, but is the reason for doing R020
    >> before R010 because you haven't done the MEASURES clause yet?
    > 
    > One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
    > harder for me because modifying main SELECT clause could be a hard
    > work. Another reason is, I had no idea how to implement PREV/NEXT in
    > other than in WINDOW clause. Other people might feel differently
    > though.
    
    
    I think we could do this with a single tuplesort if we use backtracking 
    (which might be really slow for some patterns).  I have not looked into 
    it in any detail.
    
    We would need to be able to remove tuples from the end (even if only 
    logically), and be able to update tuples inside the store.  Both of 
    those needs come from backtracking and possibly changing the classifier.
    
    Without backtracking, I don't see how we could do it without have a 
    separate tuplestore for every current possible match.
    
    
    >> In any case, I will be watching this with a close eye, and I am eager
    >> to help in any way I can.
    > 
    > Thank you! I am looking forward to comments on my patch.  Also any
    > idea how to implement MEASURES clause is welcome.
    
    
    I looked at your v2 patches a little bit and the only comment that I 
    currently have on the code is you spelled PERMUTE as PREMUTE. 
    Everything else is hopefully explained above.
    -- 
    Vik Fearing
    
    
    
    
    
  6. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-28T00:58:19Z

    > Okay, I see the problem now, and why you need the rpr() function.
    > 
    > You are doing this as something that happens over a window frame, but
    > it is actually something that *reduces* the window frame.  The pattern
    > matching needs to be done when the frame is calculated and not when
    > any particular function is applied over it.
    
    Yes. (I think the standard calls the window frame as "full window
    frame" in context of RPR to make a contrast with the subset of the
    frame rows restricted by RPR. The paper I refered to as [2] claims
    that the latter window frame is called "reduced window frame" in the
    standard but I wasn't able to find the term in the standard.)
    
    I wanted to demonstate that pattern matching logic is basically
    correct in the PoC patch. Now what I need to do is, move the row
    pattern matching logic to somewhere inside nodeWindowAgg so that
    "restricted window frame" can be applied to all window functions and
    window aggregates. Currently I am looking into update_frameheadpos()
    and update_frametailpos() which calculate the frame head and tail
    against current row. What do you think?
    
    > This query (with all the defaults made explicit):
    > 
    > SELECT s.company, s.tdate, s.price,
    >        FIRST_VALUE(s.tdate) OVER w,
    >        LAST_VALUE(s.tdate) OVER w,
    >        lowest OVER w
    > FROM stock AS s
    > WINDOW w AS (
    >   PARTITION BY s.company
    >   ORDER BY s.tdate
    >   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >   EXCLUDE NO OTHERS
    >   MEASURES
    >     LAST(DOWN) AS lowest
    >   AFTER MATCH SKIP PAST LAST ROW
    >   INITIAL PATTERN (START DOWN+ UP+)
    >   DEFINE
    >     START AS TRUE,
    >     UP AS price > PREV(price),
    >     DOWN AS price < PREV(price)
    > );
    > 
    > Should produce this result:
    
    [snip]
    
    Thanks for the examples. I agree with the expected query results.
    
    >>>> o SUBSET is not supported
    >>>
    >>> Is this because you haven't done it yet, or because you ran into
    >>> problems trying to do it?
    >> Because it seems SUBSET is not useful without MEASURES support. Thus
    >> my plan is, firstly implement MEASURES, then SUBSET. What do you
    >> think?
    > 
    > 
    > SUBSET elements can be used in DEFINE clauses, but I do not think this
    > is important compared to other features.
    
    Ok.
    
    >>> I have not looked at the patch yet, but is the reason for doing R020
    >>> before R010 because you haven't done the MEASURES clause yet?
    >> One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
    >> harder for me because modifying main SELECT clause could be a hard
    >> work. Another reason is, I had no idea how to implement PREV/NEXT in
    >> other than in WINDOW clause. Other people might feel differently
    >> though.
    > 
    > 
    > I think we could do this with a single tuplesort if we use
    > backtracking (which might be really slow for some patterns).  I have
    > not looked into it in any detail.
    > 
    > We would need to be able to remove tuples from the end (even if only
    > logically), and be able to update tuples inside the store.  Both of
    > those needs come from backtracking and possibly changing the
    > classifier.
    > 
    > Without backtracking, I don't see how we could do it without have a
    > separate tuplestore for every current possible match.
    
    Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
    into Window clause with RPR?
    
    > I looked at your v2 patches a little bit and the only comment that I
    > currently have on the code is you spelled PERMUTE as
    > PREMUTE. Everything else is hopefully explained above.
    
    Thanks. Will fix.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  7. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-28T12:17:00Z

    Small question.
    
    > This query (with all the defaults made explicit):
    > 
    > SELECT s.company, s.tdate, s.price,
    >        FIRST_VALUE(s.tdate) OVER w,
    >        LAST_VALUE(s.tdate) OVER w,
    >        lowest OVER w
    > FROM stock AS s
    > WINDOW w AS (
    >   PARTITION BY s.company
    >   ORDER BY s.tdate
    >   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >   EXCLUDE NO OTHERS
    >   MEASURES
    >     LAST(DOWN) AS lowest
    >   AFTER MATCH SKIP PAST LAST ROW
    >   INITIAL PATTERN (START DOWN+ UP+)
    >   DEFINE
    >     START AS TRUE,
    >     UP AS price > PREV(price),
    >     DOWN AS price < PREV(price)
    > );
    
    >     LAST(DOWN) AS lowest
    
    should be "LAST(DOWN.price) AS lowest"?
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  8. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-06-28T22:30:43Z

    On 6/28/23 14:17, Tatsuo Ishii wrote:
    > Small question.
    > 
    >> This query (with all the defaults made explicit):
    >>
    >> SELECT s.company, s.tdate, s.price,
    >>         FIRST_VALUE(s.tdate) OVER w,
    >>         LAST_VALUE(s.tdate) OVER w,
    >>         lowest OVER w
    >> FROM stock AS s
    >> WINDOW w AS (
    >>    PARTITION BY s.company
    >>    ORDER BY s.tdate
    >>    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >>    EXCLUDE NO OTHERS
    >>    MEASURES
    >>      LAST(DOWN) AS lowest
    >>    AFTER MATCH SKIP PAST LAST ROW
    >>    INITIAL PATTERN (START DOWN+ UP+)
    >>    DEFINE
    >>      START AS TRUE,
    >>      UP AS price > PREV(price),
    >>      DOWN AS price < PREV(price)
    >> );
    > 
    >>      LAST(DOWN) AS lowest
    > 
    > should be "LAST(DOWN.price) AS lowest"?
    
    Yes, it should be.  And the tdate='07-08-2023' row in the first 
    resultset should have '07-08-2023' and '07-10-2023' as its 4th and 5th 
    columns.
    
    Since my brain is doing the processing instead of postgres, I made some 
    human errors. :-)
    -- 
    Vik Fearing
    
    
    
    
    
  9. Re: Row pattern recognition

    Jacob Champion <jchampion@timescale.com> — 2023-07-19T16:30:40Z

    Hello,
    
    Thanks for working on this! We're interested in RPR as well, and I've
    been trying to get up to speed with the specs, to maybe make myself
    useful.
    
    On 6/27/23 17:58, Tatsuo Ishii wrote:
    > Yes. (I think the standard calls the window frame as "full window
    > frame" in context of RPR to make a contrast with the subset of the
    > frame rows restricted by RPR. The paper I refered to as [2] claims
    > that the latter window frame is called "reduced window frame" in the
    > standard but I wasn't able to find the term in the standard.)
    
    19075-5 discusses that, at least; not sure about other parts of the spec.
    
    > Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
    > into Window clause with RPR?
    
    Are we guaranteed to always have an equivalent window clause? There seem
    to be many differences between the two, especially when it comes to ONE
    ROW/ALL ROWS PER MATCH.
    
    --
    
    To add onto what Vik said above:
    
    >> It seems RPR in the standard is quite complex. I think we can start
    >> with a small subset of RPR then we could gradually enhance the
    >> implementation.
    > 
    > I have no problem with that as long as we don't paint ourselves into a 
    > corner.
    
    To me, PATTERN looks like an area where we may want to support a broader
    set of operations in the first version. The spec has a bunch of
    discussion around cases like empty matches, match order of alternation
    and permutation, etc., which are not possible to express or test with
    only the + quantifier. Those might be harder to get right in a v2, if we
    don't at least keep them in mind for v1?
    
    > +static List *
    > +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef)
    > +{
    > +   List        *patterns;
    
    My compiler complains about the `patterns` variable here, which is
    returned without ever being initialized. (The caller doesn't seem to use
    it.)
    
    > +-- basic test using PREV
    > +SELECT company, tdate, price, rpr(price) OVER w FROM stock
    > + WINDOW w AS (
    > + PARTITION BY company
    > + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > + INITIAL
    > + PATTERN (START UP+ DOWN+)
    > + DEFINE
    > +  START AS TRUE,
    > +  UP AS price > PREV(price),
    > +  DOWN AS price < PREV(price)
    > +);
    
    nitpick: IMO the tests should be making use of ORDER BY in the window
    clauses.
    
    This is a very big feature. I agree with you that MEASURES seems like a
    very important "next piece" to add. Are there other areas where you
    would like reviewers to focus on right now (or avoid)?
    
    Thanks!
    --Jacob
    
    
    
    
  10. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-20T05:15:13Z

    > Hello,
    > 
    > Thanks for working on this! We're interested in RPR as well, and I've
    > been trying to get up to speed with the specs, to maybe make myself
    > useful.
    
    Thank you for being interested in this.
    
    > 19075-5 discusses that, at least; not sure about other parts of the spec.
    
    Thanks for the info. Unfortunately I don't have 19075-5 though.
    
    >> Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
    >> into Window clause with RPR?
    > 
    > Are we guaranteed to always have an equivalent window clause? There seem
    > to be many differences between the two, especially when it comes to ONE
    > ROW/ALL ROWS PER MATCH.
    
    You are right. I am not 100% sure if the rewriting is possible at this
    point.
    
    > To add onto what Vik said above:
    > 
    >>> It seems RPR in the standard is quite complex. I think we can start
    >>> with a small subset of RPR then we could gradually enhance the
    >>> implementation.
    >> 
    >> I have no problem with that as long as we don't paint ourselves into a 
    >> corner.
    > 
    > To me, PATTERN looks like an area where we may want to support a broader
    > set of operations in the first version.
    
    Me too but...
    
    > The spec has a bunch of
    > discussion around cases like empty matches, match order of alternation
    > and permutation, etc., which are not possible to express or test with
    > only the + quantifier. Those might be harder to get right in a v2, if we
    > don't at least keep them in mind for v1?
    
    Currently my patch has a limitation for the sake of simple
    implementation: a pattern like "A+" is parsed and analyzed in the raw
    parser. This makes subsequent process much easier because the pattern
    element variable (in this case "A") and the quantifier (in this case
    "+") is already identified by the raw parser. However there are much
    more cases are allowed in the standard as you already pointed out. For
    those cases probably we should give up to parse PATTERN items in the
    raw parser, and instead the raw parser just accepts the elements as
    Sconst?
    
    >> +static List *
    >> +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef)
    >> +{
    >> +   List        *patterns;
    > 
    > My compiler complains about the `patterns` variable here, which is
    > returned without ever being initialized. (The caller doesn't seem to use
    > it.)
    
    Will fix.
    
    >> +-- basic test using PREV
    >> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
    >> + WINDOW w AS (
    >> + PARTITION BY company
    >> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >> + INITIAL
    >> + PATTERN (START UP+ DOWN+)
    >> + DEFINE
    >> +  START AS TRUE,
    >> +  UP AS price > PREV(price),
    >> +  DOWN AS price < PREV(price)
    >> +);
    > 
    > nitpick: IMO the tests should be making use of ORDER BY in the window
    > clauses.
    
    Right. Will fix.
    
    > This is a very big feature. I agree with you that MEASURES seems like a
    > very important "next piece" to add. Are there other areas where you
    > would like reviewers to focus on right now (or avoid)?
    
    Any comments, especially on the PREV/NEXT implementation part is
    welcome. Currently the DEFINE expression like "price > PREV(price)" is
    prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
    in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
    evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
    previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
    TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
    hack and should be gotten ride of before v1 is committed. Better idea?
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  11. Re: Row pattern recognition

    Jacob Champion <jchampion@timescale.com> — 2023-07-20T23:36:37Z

    Hi Ishii-san,
    
    On 7/19/23 22:15, Tatsuo Ishii wrote:
    > Currently my patch has a limitation for the sake of simple
    > implementation: a pattern like "A+" is parsed and analyzed in the raw
    > parser. This makes subsequent process much easier because the pattern
    > element variable (in this case "A") and the quantifier (in this case
    > "+") is already identified by the raw parser. However there are much
    > more cases are allowed in the standard as you already pointed out. For
    > those cases probably we should give up to parse PATTERN items in the
    > raw parser, and instead the raw parser just accepts the elements as
    > Sconst?
    
    Is there a concern that the PATTERN grammar can't be represented in
    Bison? I thought it was all context-free... Or is the concern that the
    parse tree of the pattern is hard to feed into a regex engine?
    
    > Any comments, especially on the PREV/NEXT implementation part is
    > welcome. Currently the DEFINE expression like "price > PREV(price)" is
    > prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
    > in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
    > evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
    > previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
    > TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
    > hack and should be gotten ride of before v1 is committed. Better idea?
    
    I'm not familiar enough with this code yet to offer very concrete
    suggestions, sorry... But at some point in the future, we need to be
    able to skip forward and backward from arbitrary points, like
    
        DEFINE B AS B.price > PREV(FIRST(A.price), 3)
    
    so there won't be just one pair of "previous and next" tuples. Maybe
    that can help clarify the design? It feels like it'll need to eventually
    be a "real" function that operates on the window state, even if it
    doesn't support all of the possible complexities in v1.
    
    --
    
    Taking a closer look at the regex engine:
    
    It looks like the + qualifier has trouble when it meets the end of the
    frame. For instance, try removing the last row of the 'stock' table in
    rpr.sql; some of the final matches will disappear unexpectedly. Or try a
    pattern like
    
        PATTERN ( a+ )
         DEFINE a AS TRUE
    
    which doesn't seem to match anything in my testing.
    
    There's also the issue of backtracking in the face of reclassification,
    as I think Vik was alluding to upthread. The pattern
    
        PATTERN ( a+ b+ )
         DEFINE a AS col = 2,
                b AS col = 2
    
    doesn't match a sequence of values (2 2 ...) with the current
    implementation, even with a dummy row at the end to avoid the
    end-of-frame bug.
    
    (I've attached two failing tests against v2, to hopefully better
    illustrate, along with what I _think_ should be the correct results.)
    
    I'm not quite understanding the match loop in evaluate_pattern(). It
    looks like we're building up a string to pass to the regex engine, but
    by the we call regexp_instr, don't we already know whether or not the
    pattern will match based on the expression evaluation we've done?
    
    Thanks,
    --Jacob
  12. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-07-21T00:07:44Z

    On 7/21/23 01:36, Jacob Champion wrote:
    > There's also the issue of backtracking in the face of reclassification,
    > as I think Vik was alluding to upthread.
    
    We definitely need some kind of backtracking or other reclassification 
    method.
    
    > (I've attached two failing tests against v2, to hopefully better
    > illustrate, along with what I_think_  should be the correct results.)
    
    Almost.  You are matching 07-01-2023 but the condition is "price > 100".
    -- 
    Vik Fearing
    
    
    
    
    
  13. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-21T06:16:48Z

    Hi,
    
    > Hi Ishii-san,
    > 
    > On 7/19/23 22:15, Tatsuo Ishii wrote:
    >> Currently my patch has a limitation for the sake of simple
    >> implementation: a pattern like "A+" is parsed and analyzed in the raw
    >> parser. This makes subsequent process much easier because the pattern
    >> element variable (in this case "A") and the quantifier (in this case
    >> "+") is already identified by the raw parser. However there are much
    >> more cases are allowed in the standard as you already pointed out. For
    >> those cases probably we should give up to parse PATTERN items in the
    >> raw parser, and instead the raw parser just accepts the elements as
    >> Sconst?
    > 
    > Is there a concern that the PATTERN grammar can't be represented in
    > Bison? I thought it was all context-free...
    
    I don't know at this point. I think context-free is not enough to be
    repsented in Bison. The grammer also needs to be LALR(1).  Moreover,
    adding the grammer to existing parser may generate shift/reduce
    errors.
    
    > Or is the concern that the
    > parse tree of the pattern is hard to feed into a regex engine?
    
    One small concern is how to convert pattern variables to regex
    expression which our regex enginer understands. Suppose,
    
    PATTERN UP+
    
    Currently I convert "UP+" to "U+" so that it can be fed to the regexp
    engine. In order to do that, we need to know which part of the pattern
    (UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
    for more complex regular expressions it would be not, unless PATTERN
    grammer can be analyzed by our parser to know which part is the
    pattern variable.
    
    >> Any comments, especially on the PREV/NEXT implementation part is
    >> welcome. Currently the DEFINE expression like "price > PREV(price)" is
    >> prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
    >> in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
    >> evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
    >> previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
    >> TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
    >> hack and should be gotten ride of before v1 is committed. Better idea?
    > 
    > I'm not familiar enough with this code yet to offer very concrete
    > suggestions, sorry... But at some point in the future, we need to be
    > able to skip forward and backward from arbitrary points, like
    > 
    >     DEFINE B AS B.price > PREV(FIRST(A.price), 3)
    > 
    > so there won't be just one pair of "previous and next" tuples.
    
    Yes, I know.
    
    > Maybe
    > that can help clarify the design? It feels like it'll need to eventually
    > be a "real" function that operates on the window state, even if it
    > doesn't support all of the possible complexities in v1.
    
    Unfortunately an window function can not call other window functions.
    
    > Taking a closer look at the regex engine:
    > 
    > It looks like the + qualifier has trouble when it meets the end of the
    > frame. For instance, try removing the last row of the 'stock' table in
    > rpr.sql; some of the final matches will disappear unexpectedly. Or try a
    > pattern like
    > 
    >     PATTERN ( a+ )
    >      DEFINE a AS TRUE
    > 
    > which doesn't seem to match anything in my testing.
    > 
    > There's also the issue of backtracking in the face of reclassification,
    > as I think Vik was alluding to upthread. The pattern
    > 
    >     PATTERN ( a+ b+ )
    >      DEFINE a AS col = 2,
    >             b AS col = 2
    > 
    > doesn't match a sequence of values (2 2 ...) with the current
    > implementation, even with a dummy row at the end to avoid the
    > end-of-frame bug.
    > 
    > (I've attached two failing tests against v2, to hopefully better
    > illustrate, along with what I _think_ should be the correct results.)
    
    Thanks. I will look into this.
    
    > I'm not quite understanding the match loop in evaluate_pattern(). It
    > looks like we're building up a string to pass to the regex engine, but
    > by the we call regexp_instr, don't we already know whether or not the
    > pattern will match based on the expression evaluation we've done?
    
    For "+" yes. But for more complex regular expression like '{n}', we
    need to call our regexp engine to check if the pattern matches.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  14. Re: Row pattern recognition

    Jacob Champion <jchampion@timescale.com> — 2023-07-21T23:14:12Z

    On 7/20/23 17:07, Vik Fearing wrote:
    > On 7/21/23 01:36, Jacob Champion wrote:
    >> (I've attached two failing tests against v2, to hopefully better
    >> illustrate, along with what I_think_  should be the correct results.)
    > 
    > Almost.  You are matching 07-01-2023 but the condition is "price > 100".
    
    D'oh. Correction attached. I think :)
    
    Thanks,
    --Jacob
  15. Re: Row pattern recognition

    Jacob Champion <jchampion@timescale.com> — 2023-07-21T23:16:18Z

    On 7/20/23 23:16, Tatsuo Ishii wrote:
    > I don't know at this point. I think context-free is not enough to be
    > repsented in Bison. The grammer also needs to be LALR(1).  Moreover,
    > adding the grammer to existing parser may generate shift/reduce
    > errors.
    
    Ah. It's been too long since my compilers classes; I will pipe down.
    
    > One small concern is how to convert pattern variables to regex
    > expression which our regex enginer understands. Suppose,
    > 
    > PATTERN UP+
    > 
    > Currently I convert "UP+" to "U+" so that it can be fed to the regexp
    > engine. In order to do that, we need to know which part of the pattern
    > (UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
    > for more complex regular expressions it would be not, unless PATTERN
    > grammer can be analyzed by our parser to know which part is the
    > pattern variable.
    
    Is the eventual plan to generate multiple alternatives, and run the
    regex against them one at a time?
    
    >> I'm not familiar enough with this code yet to offer very concrete
    >> suggestions, sorry... But at some point in the future, we need to be
    >> able to skip forward and backward from arbitrary points, like
    >>
    >>     DEFINE B AS B.price > PREV(FIRST(A.price), 3)
    >>
    >> so there won't be just one pair of "previous and next" tuples.
    > 
    > Yes, I know.
    
    I apologize. I got overexplain-y.
    
    >> Maybe
    >> that can help clarify the design? It feels like it'll need to eventually
    >> be a "real" function that operates on the window state, even if it
    >> doesn't support all of the possible complexities in v1.
    > 
    > Unfortunately an window function can not call other window functions.
    
    Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case? Or
    does it make sense to split the pattern navigation "functions" into
    their own new concept, and maybe borrow some of the window function
    infrastructure for it?
    
    Thanks!
    --Jacob
    
    
    
    
  16. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-07-21T23:38:01Z

    On 7/22/23 01:14, Jacob Champion wrote:
    > On 7/20/23 17:07, Vik Fearing wrote:
    >> On 7/21/23 01:36, Jacob Champion wrote:
    >>> (I've attached two failing tests against v2, to hopefully better
    >>> illustrate, along with what I_think_  should be the correct results.)
    >>
    >> Almost.  You are matching 07-01-2023 but the condition is "price > 100".
    > 
    > D'oh. Correction attached. I think :)
    
    This looks correct to my human brain.  Thanks!
    -- 
    Vik Fearing
    
    
    
    
    
  17. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-22T01:11:49Z

    >> One small concern is how to convert pattern variables to regex
    >> expression which our regex enginer understands. Suppose,
    >> 
    >> PATTERN UP+
    >> 
    >> Currently I convert "UP+" to "U+" so that it can be fed to the regexp
    >> engine. In order to do that, we need to know which part of the pattern
    >> (UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
    >> for more complex regular expressions it would be not, unless PATTERN
    >> grammer can be analyzed by our parser to know which part is the
    >> pattern variable.
    > 
    > Is the eventual plan to generate multiple alternatives, and run the
    > regex against them one at a time?
    
    Yes, that's my plan.
    
    >>> I'm not familiar enough with this code yet to offer very concrete
    >>> suggestions, sorry... But at some point in the future, we need to be
    >>> able to skip forward and backward from arbitrary points, like
    >>>
    >>>     DEFINE B AS B.price > PREV(FIRST(A.price), 3)
    >>>
    >>> so there won't be just one pair of "previous and next" tuples.
    >> 
    >> Yes, I know.
    > 
    > I apologize. I got overexplain-y.
    
    No problem. Thank you for reminding me it.
    
    >>> Maybe
    >>> that can help clarify the design? It feels like it'll need to eventually
    >>> be a "real" function that operates on the window state, even if it
    >>> doesn't support all of the possible complexities in v1.
    >> 
    >> Unfortunately an window function can not call other window functions.
    > 
    > Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?
    
    I am not sure at this point. Current PostgreSQL executor creates
    WindowStatePerFuncData for each window function and aggregate
    appearing in OVER clause. This means PREV/NEXT and other row pattern
    navigation operators cannot have their own WindowStatePerFuncData if
    they do not appear in OVER clauses in a query even if PREV/NEXT
    etc. are defined as window function.
    
    > Or
    > does it make sense to split the pattern navigation "functions" into
    > their own new concept, and maybe borrow some of the window function
    > infrastructure for it?
    
    Maybe. Suppose a window function executes row pattern matching using
    price > PREV(price). The window function already receives
    WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
    PREV, we could let PREV do the real work (getting previous tuple).
    I have not tried this yet, though.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  18. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-07-22T02:54:43Z

    On 7/22/23 03:11, Tatsuo Ishii wrote:
    >>>> Maybe
    >>>> that can help clarify the design? It feels like it'll need to eventually
    >>>> be a "real" function that operates on the window state, even if it
    >>>> doesn't support all of the possible complexities in v1.
    >>> Unfortunately an window function can not call other window functions.
    >> Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?
    
    > I am not sure at this point. Current PostgreSQL executor creates
    > WindowStatePerFuncData for each window function and aggregate
    > appearing in OVER clause. This means PREV/NEXT and other row pattern
    > navigation operators cannot have their own WindowStatePerFuncData if
    > they do not appear in OVER clauses in a query even if PREV/NEXT
    > etc. are defined as window function.
    > 
    >> Or
    >> does it make sense to split the pattern navigation "functions" into
    >> their own new concept, and maybe borrow some of the window function
    >> infrastructure for it?
    
    > Maybe. Suppose a window function executes row pattern matching using
    > price > PREV(price). The window function already receives
    > WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
    > PREV, we could let PREV do the real work (getting previous tuple).
    > I have not tried this yet, though.
    
    
    I don't understand this logic.  Window functions work over a window 
    frame.  What we are talking about here is *defining* a window frame. 
    How can a window function execute row pattern matching?
    -- 
    Vik Fearing
    
    
    
    
    
  19. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-22T06:14:46Z

    > On 7/22/23 03:11, Tatsuo Ishii wrote:
    >>>>> Maybe
    >>>>> that can help clarify the design? It feels like it'll need to
    >>>>> eventually
    >>>>> be a "real" function that operates on the window state, even if it
    >>>>> doesn't support all of the possible complexities in v1.
    >>>> Unfortunately an window function can not call other window functions.
    >>> Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?
    > 
    >> I am not sure at this point. Current PostgreSQL executor creates
    >> WindowStatePerFuncData for each window function and aggregate
    >> appearing in OVER clause. This means PREV/NEXT and other row pattern
    >> navigation operators cannot have their own WindowStatePerFuncData if
    >> they do not appear in OVER clauses in a query even if PREV/NEXT
    >> etc. are defined as window function.
    >> 
    >>> Or
    >>> does it make sense to split the pattern navigation "functions" into
    >>> their own new concept, and maybe borrow some of the window function
    >>> infrastructure for it?
    > 
    >> Maybe. Suppose a window function executes row pattern matching using
    >> price > PREV(price). The window function already receives
    >> WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
    >> PREV, we could let PREV do the real work (getting previous tuple).
    >> I have not tried this yet, though.
    > 
    > 
    > I don't understand this logic.  Window functions work over a window
    > frame.
    
    Yes.
    
    > What we are talking about here is *defining* a window
    > frame.
    
    Well, we are defining a "reduced" window frame within a (full) window
    frame. A "reduced" window frame is calculated each time when a window
    function is called.
    
    > How can a window function execute row pattern matching?
    
    A window function is called for each row fed by an outer plan. It
    fetches current, previous and next row to execute pattern matching. If
    it matches, the window function moves to next row and repeat the
    process, until pattern match fails.
    
    Below is an example window function to execute pattern matching (I
    will include this in the v3 patch). row_is_in_reduced_frame() is a
    function to execute pattern matching. It returns the number of rows in
    the reduced frame if pattern match succeeds. If succeeds, the function
    returns the last row in the reduced frame instead of the last row in
    the full window frame.
    
    /*
     * last_value
     * return the value of VE evaluated on the last row of the
     * window frame, per spec.
     */
    Datum
    window_last_value(PG_FUNCTION_ARGS)
    {
    	WindowObject winobj = PG_WINDOW_OBJECT();
    	Datum		result;
    	bool		isnull;
    	int64		abspos;
    	int			num_reduced_frame;
    
    	abspos = WinGetCurrentPosition(winobj);
    	num_reduced_frame = row_is_in_reduced_frame(winobj, abspos);
    
    	if (num_reduced_frame == 0)
    		/* no RPR is involved */
    		result = WinGetFuncArgInFrame(winobj, 0,
    									  0, WINDOW_SEEK_TAIL, true,
    									  &isnull, NULL);
    	else if (num_reduced_frame > 0)
    		/* get last row value in the reduced frame */
    		result = WinGetFuncArgInFrame(winobj, 0,
    									  num_reduced_frame - 1, WINDOW_SEEK_HEAD, true,
    									  &isnull, NULL);
    	else
    		/* RPR is involved and current row is unmatched or skipped */
    		isnull = true;
    
    	if (isnull)
    		PG_RETURN_NULL();
    
    	PG_RETURN_DATUM(result);
    }
    
    
    
    
  20. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-07-23T21:29:46Z

    On 7/22/23 08:14, Tatsuo Ishii wrote:
    >> On 7/22/23 03:11, Tatsuo Ishii wrote:
    >>> Maybe. Suppose a window function executes row pattern matching using
    >>> price > PREV(price). The window function already receives
    >>> WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
    >>> PREV, we could let PREV do the real work (getting previous tuple).
    >>> I have not tried this yet, though.
    >>
    >> I don't understand this logic.  Window functions work over a window
    >> frame.
    > 
    > Yes.
    > 
    >> What we are talking about here is *defining* a window
    >> frame.
    > 
    > Well, we are defining a "reduced" window frame within a (full) window
    > frame. A "reduced" window frame is calculated each time when a window
    > function is called.
    
    
    Why?  It should only be recalculated when the current row changes and we 
    need a new frame.  The reduced window frame *is* the window frame for 
    all functions over that window.
    
    
    >> How can a window function execute row pattern matching?
    > 
    > A window function is called for each row fed by an outer plan. It
    > fetches current, previous and next row to execute pattern matching. If
    > it matches, the window function moves to next row and repeat the
    > process, until pattern match fails.
    > 
    > Below is an example window function to execute pattern matching (I
    > will include this in the v3 patch). row_is_in_reduced_frame() is a
    > function to execute pattern matching. It returns the number of rows in
    > the reduced frame if pattern match succeeds. If succeeds, the function
    > returns the last row in the reduced frame instead of the last row in
    > the full window frame.
    
    
    I strongly disagree with this.  Window function do not need to know how 
    the frame is defined, and indeed they should not.  WinGetFuncArgInFrame 
    should answer yes or no and the window function just works on that. 
    Otherwise we will get extension (and possibly even core) functions that 
    don't handle the frame properly.
    -- 
    Vik Fearing
    
    
    
    
    
  21. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-24T00:22:40Z

    >>> What we are talking about here is *defining* a window
    >>> frame.
    >> Well, we are defining a "reduced" window frame within a (full) window
    >> frame. A "reduced" window frame is calculated each time when a window
    >> function is called.
    > 
    > 
    > Why?  It should only be recalculated when the current row changes and
    > we need a new frame.  The reduced window frame *is* the window frame
    > for all functions over that window.
    
    We already recalculate a frame each time a row is processed even
    without RPR. See ExecWindowAgg.
    
    Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
    which means the frame head is changed each time current row position
    changes.
    
    > I strongly disagree with this.  Window function do not need to know
    > how the frame is defined, and indeed they should not.
    
    We already break the rule by defining *support functions. See
    windowfuncs.c.
    
    > WinGetFuncArgInFrame should answer yes or no and the window function
    > just works on that. Otherwise we will get extension (and possibly even
    > core) functions that don't handle the frame properly.
    
    Maybe I can move row_is_in_reduced_frame into WinGetFuncArgInFrame
    just for convenience.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  22. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-07-24T23:14:37Z

    On 7/24/23 02:22, Tatsuo Ishii wrote:
    >>>> What we are talking about here is *defining* a window
    >>>> frame.
    >>> Well, we are defining a "reduced" window frame within a (full) window
    >>> frame. A "reduced" window frame is calculated each time when a window
    >>> function is called.
    >>
    >>
    >> Why?  It should only be recalculated when the current row changes and
    >> we need a new frame.  The reduced window frame *is* the window frame
    >> for all functions over that window.
    > 
    > We already recalculate a frame each time a row is processed even
    > without RPR. See ExecWindowAgg.
    
    Yes, after each row.  Not for each function.
    
    > Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
    > which means the frame head is changed each time current row position
    > changes.
    
    Off topic for now: I wonder why this restriction is in place and whether 
    we should respect or ignore it.  That is a discussion for another time, 
    though.
    
    >> I strongly disagree with this.  Window function do not need to know
    >> how the frame is defined, and indeed they should not.
    > 
    > We already break the rule by defining *support functions. See
    > windowfuncs.c.
    
    The support functions don't know anything about the frame, they just 
    know when a window function is monotonically increasing and execution 
    can either stop or be "passed through".
    
    >> WinGetFuncArgInFrame should answer yes or no and the window function
    >> just works on that. Otherwise we will get extension (and possibly even
    >> core) functions that don't handle the frame properly.
    > 
    > Maybe I can move row_is_in_reduced_frame into WinGetFuncArgInFrame
    > just for convenience.
    
    I have two comments about this:
    
    It isn't just for convenience, it is for correctness.  The window 
    functions do not need to know which rows they are *not* operating on.
    
    There is no such thing as a "full" or "reduced" frame.  The standard 
    uses those terms to explain the difference between before and after RPR 
    is applied, but window functions do not get to choose which frame they 
    apply over.  They only ever apply over the reduced window frame.
    -- 
    Vik Fearing
    
    
    
    
    
  23. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-25T12:35:04Z

    Hi,
    
    > diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out
    > index 6bf8818911..f3fd22de2a 100644
    > --- a/src/test/regress/expected/rpr.out
    > +++ b/src/test/regress/expected/rpr.out
    > @@ -230,6 +230,79 @@ SELECT company, tdate, price, rpr(price) OVER w FROM stock
    >   company2 | 07-10-2023 |  1300 |     
    >  (20 rows)
    >  
    > +-- match everything
    > +SELECT company, tdate, price, rpr(price) OVER w FROM stock
    > + WINDOW w AS (
    > + PARTITION BY company
    > + ORDER BY tdate
    > + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > + AFTER MATCH SKIP TO NEXT ROW
    
    It seems it's a result with AFTER MATCH SKIP PAST LAST ROW.
    
    > + INITIAL
    > + PATTERN (A+)
    > + DEFINE
    > +  A AS TRUE
    > +);
    > + company  |   tdate    | price | rpr 
    > +----------+------------+-------+-----
    > + company1 | 07-01-2023 |   100 | 100
    > + company1 | 07-02-2023 |   200 |    
    > + company1 | 07-03-2023 |   150 |    
    > + company1 | 07-04-2023 |   140 |    
    > + company1 | 07-05-2023 |   150 |    
    > + company1 | 07-06-2023 |    90 |    
    > + company1 | 07-07-2023 |   110 |    
    > + company1 | 07-08-2023 |   130 |    
    > + company1 | 07-09-2023 |   120 |    
    > + company1 | 07-10-2023 |   130 |    
    > + company2 | 07-01-2023 |    50 |  50
    > + company2 | 07-02-2023 |  2000 |    
    > + company2 | 07-03-2023 |  1500 |    
    > + company2 | 07-04-2023 |  1400 |    
    > + company2 | 07-05-2023 |  1500 |    
    > + company2 | 07-06-2023 |    60 |    
    > + company2 | 07-07-2023 |  1100 |    
    > + company2 | 07-08-2023 |  1300 |    
    > + company2 | 07-09-2023 |  1200 |    
    > + company2 | 07-10-2023 |  1300 |    
    > +(20 rows)
    > +
    > +-- backtracking with reclassification of rows
    > +SELECT company, tdate, price, rpr(price) OVER w FROM stock
    > + WINDOW w AS (
    > + PARTITION BY company
    > + ORDER BY tdate
    > + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > + AFTER MATCH SKIP TO NEXT ROW
    > + INITIAL
    > + PATTERN (A+ B+)
    > + DEFINE
    > +  A AS price > 100,
    > +  B AS price > 100
    > +);
    > + company  |   tdate    | price | rpr  
    > +----------+------------+-------+------
    > + company1 | 07-01-2023 |   100 |     
    > + company1 | 07-02-2023 |   200 |  200
    > + company1 | 07-03-2023 |   150 |     
    > + company1 | 07-04-2023 |   140 |     
    > + company1 | 07-05-2023 |   150 |     
    > + company1 | 07-06-2023 |    90 |     
    > + company1 | 07-07-2023 |   110 |  110
    > + company1 | 07-08-2023 |   130 |     
    > + company1 | 07-09-2023 |   120 |     
    > + company1 | 07-10-2023 |   130 |     
    > + company2 | 07-01-2023 |    50 |     
    > + company2 | 07-02-2023 |  2000 | 2000
    > + company2 | 07-03-2023 |  1500 |     
    > + company2 | 07-04-2023 |  1400 |     
    > + company2 | 07-05-2023 |  1500 |     
    > + company2 | 07-06-2023 |    60 |     
    > + company2 | 07-07-2023 |  1100 | 1100
    > + company2 | 07-08-2023 |  1300 |     
    > + company2 | 07-09-2023 |  1200 |     
    > + company2 | 07-10-2023 |  1300 |     
    > +(20 rows)
    > +
    >  --
    >  -- Error cases
    >  --
    > diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql
    > index 951c9abfe9..f1cd0369f4 100644
    > --- a/src/test/regress/sql/rpr.sql
    > +++ b/src/test/regress/sql/rpr.sql
    > @@ -94,6 +94,33 @@ SELECT company, tdate, price, rpr(price) OVER w FROM stock
    >    UPDOWN AS price > PREV(price) AND price > NEXT(price)
    >  );
    >  
    > +-- match everything
    > +SELECT company, tdate, price, rpr(price) OVER w FROM stock
    > + WINDOW w AS (
    > + PARTITION BY company
    > + ORDER BY tdate
    > + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > + AFTER MATCH SKIP TO NEXT ROW
    > + INITIAL
    > + PATTERN (A+)
    > + DEFINE
    > +  A AS TRUE
    > +);
    > +
    > +-- backtracking with reclassification of rows
    > +SELECT company, tdate, price, rpr(price) OVER w FROM stock
    > + WINDOW w AS (
    > + PARTITION BY company
    > + ORDER BY tdate
    > + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > + AFTER MATCH SKIP TO NEXT ROW
    > + INITIAL
    > + PATTERN (A+ B+)
    > + DEFINE
    > +  A AS price > 100,
    > +  B AS price > 100
    > +);
    > +
    >  --
    >  -- Error cases
    >  --
    
    
    
    
  24. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-26T12:21:34Z

    Attached is the v3 patch. In this patch following changes are made.
    
    (1) I completely changed the pattern matching engine so that it
    performs backtracking. Now the engine evaluates all pattern elements
    defined in PATTER against each row, saving matched pattern variables
    in a string per row. For example if the pattern element A and B
    evaluated to true, a string "AB" is created for current row.
    
    This continues until all pattern matching fails or encounters the end
    of full window frame/partition. After that, the pattern matching
    engine creates all possible "pattern strings" and apply the regular
    expression matching to each. For example if we have row 0 = "AB" row 1
    = "C", possible pattern strings are: "AC" and "BC".
    
    If it matches, the length of matching substring is saved. After all
    possible trials are done, the longest matching substring is chosen and
    it becomes the width (number of rows) in the reduced window frame.
    
    See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
    in nodeWindowAggs.c for more details. For now I use a naive depth
    first search and probably there is a lot of rooms for optimization
    (for example rewriting it without using
    recursion). Suggestions/patches are welcome.
    
    Jacob Champion wrote:
    > It looks like the + qualifier has trouble when it meets the end of the
    > frame. For instance, try removing the last row of the 'stock' table in
    > rpr.sql; some of the final matches will disappear unexpectedly. Or try a
    > pattern like
    > 
    >     PATTERN ( a+ )
    >      DEFINE a AS TRUE
    > 
    > which doesn't seem to match anything in my testing.
    > 
    > There's also the issue of backtracking in the face of reclassification,
    > as I think Vik was alluding to upthread. The pattern
    > 
    >     PATTERN ( a+ b+ )
    >      DEFINE a AS col = 2,
    >             b AS col = 2
    
    With the new engine, cases above do not fail anymore. See new
    regression test cases. Thanks for providing valuable test cases!
    
    (2) Make window functions RPR aware. Now first_value, last_value, and
    nth_value recognize RPR (maybe first_value do not need any change?)
    
    Vik Fearing wrote:
    > I strongly disagree with this.  Window function do not need to know
    > how the frame is defined, and indeed they should not.
    > WinGetFuncArgInFrame should answer yes or no and the window function
    > just works on that. Otherwise we will get extension (and possibly even
    > core) functions that don't handle the frame properly.
    
    So I moved row_is_in_reduce_frame into WinGetFuncArgInFrame so that
    those window functions are not needed to be changed.
    
    (3) Window function rpr was removed. We can use first_value instead.
    
    (4) Remaining tasks/issues.
    
    - For now I disable WinSetMarkPosition because RPR often needs to
      access a row before the mark is set. We need to fix this in the
      future.
    
    - I am working on making window aggregates RPR aware now. The
      implementation is in progress and far from completeness. An example
      is below. I think row 2, 3, 4 of "count" column should be NULL
      instead of 3, 2, 0, 0. Same thing can be said to other
      rows. Probably this is an effect of moving aggregate but I still
      studying the window aggregation code.
    
    SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM stock
     WINDOW w AS (
     PARTITION BY company
     ORDER BY tdate
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
     AFTER MATCH SKIP PAST LAST ROW
     INITIAL
     PATTERN (START UP+ DOWN+)
     DEFINE
      START AS TRUE,
      UP AS price > PREV(price),
      DOWN AS price < PREV(price)
    );
     company  |   tdate    | first_value | count 
    ----------+------------+-------------+-------
     company1 | 2023-07-01 |         100 |     4
     company1 | 2023-07-02 |             |     3
     company1 | 2023-07-03 |             |     2
     company1 | 2023-07-04 |             |     0
     company1 | 2023-07-05 |             |     0
     company1 | 2023-07-06 |          90 |     4
     company1 | 2023-07-07 |             |     3
     company1 | 2023-07-08 |             |     2
     company1 | 2023-07-09 |             |     0
     company1 | 2023-07-10 |             |     0
     company2 | 2023-07-01 |          50 |     4
     company2 | 2023-07-02 |             |     3
     company2 | 2023-07-03 |             |     2
     company2 | 2023-07-04 |             |     0
     company2 | 2023-07-05 |             |     0
     company2 | 2023-07-06 |          60 |     4
     company2 | 2023-07-07 |             |     3
     company2 | 2023-07-08 |             |     2
     company2 | 2023-07-09 |             |     0
     company2 | 2023-07-10 |             |     0
    
    - If attributes appearing in DEFINE are not used in the target list, query fails.
    
    SELECT company, tdate, count(*) OVER w FROM stock
     WINDOW w AS (
     PARTITION BY company
     ORDER BY tdate
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
     AFTER MATCH SKIP PAST LAST ROW
     INITIAL
     PATTERN (START UP+ DOWN+)
     DEFINE
      START AS TRUE,
      UP AS price > PREV(price),
      DOWN AS price < PREV(price)
    );
    ERROR:  attribute number 3 exceeds number of columns 2
    
    This is because attributes appearing in DEFINE are not added to the
    target list. I am looking for way to teach planner to add attributes
    appearing in DEFINE.
    
    I am going to add this thread to CommitFest and plan to add both of
    you as reviewers. Thanks in advance.
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  25. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-26T20:22:30Z

    > I am going to add this thread to CommitFest and plan to add both of
    > you as reviewers. Thanks in advance.
    
    Done.
    https://commitfest.postgresql.org/44/4460/
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  26. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-28T07:09:53Z

    >> We already recalculate a frame each time a row is processed even
    >> without RPR. See ExecWindowAgg.
    > 
    > Yes, after each row.  Not for each function.
    
    Ok, I understand now. Closer look at the code, I realized that each
    window function calls update_frameheadpos, which computes the frame
    head position. But actually it checks winstate->framehead_valid and if
    it's already true (probably by other window function), then it does
    nothing.
    
    >> Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
    >> which means the frame head is changed each time current row position
    >> changes.
    > 
    > Off topic for now: I wonder why this restriction is in place and
    > whether we should respect or ignore it.  That is a discussion for
    > another time, though.
    
    My guess is, it is because other than ROWS BETWEEN CURRENT ROW has
    little or no meaning. Consider following example:
    
    SELECT i, first_value(i) OVER w
      FROM (VALUES (1), (2), (3), (4)) AS v (i)
      WINDOW w AS (
       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
       AFTER MATCH SKIP PAST LAST ROW
       PATTERN (A)
       DEFINE
        A AS i = 1 OR i = 3
    );
    
    In this example ROWS BETWEEN CURRENT ROW gives frames with i = 1 and i
    = 3.
    
     i | first_value 
    ---+-------------
     1 |           1
     2 |            
     3 |           3
     4 |            
    (4 rows)
    
    But what would happen with ROWS BETWEEN UNBOUNDED PRECEDING AND
    UNBOUNDED FOLLOWING?  Probably the frame i = 3 will be missed as
    at i = 2, PATTERN is not satisfied and compution of the reduced frame
    stops.
    
     i | first_value 
    ---+-------------
     1 |           1
     2 |            
     3 |           
     4 |            
    (4 rows)
    
    This is not very useful for users.
    
    >>> I strongly disagree with this.  Window function do not need to know
    >>> how the frame is defined, and indeed they should not.
    >> We already break the rule by defining *support functions. See
    >> windowfuncs.c.
    > The support functions don't know anything about the frame, they just
    > know when a window function is monotonically increasing and execution
    > can either stop or be "passed through".
    
    I see following code in window_row_number_support:
    
    		/*
    		 * The frame options can always become "ROWS BETWEEN UNBOUNDED
    		 * PRECEDING AND CURRENT ROW".  row_number() always just increments by
    		 * 1 with each row in the partition.  Using ROWS instead of RANGE
    		 * saves effort checking peer rows during execution.
    		 */
    		req->frameOptions = (FRAMEOPTION_NONDEFAULT |
    							 FRAMEOPTION_ROWS |
    							 FRAMEOPTION_START_UNBOUNDED_PRECEDING |
    							 FRAMEOPTION_END_CURRENT_ROW);
    
    I think it not only knows about frame but it even changes the frame
    options. This seems far from "don't know anything about the frame", no?
    
    > I have two comments about this:
    > 
    > It isn't just for convenience, it is for correctness.  The window
    > functions do not need to know which rows they are *not* operating on.
    > 
    > There is no such thing as a "full" or "reduced" frame.  The standard
    > uses those terms to explain the difference between before and after
    > RPR is applied, but window functions do not get to choose which frame
    > they apply over.  They only ever apply over the reduced window frame.
    
    I agree that "full window frame" and "reduced window frame" do not
    exist at the same time, and in the end (after computation of reduced
    frame), only "reduced" frame is visible to window
    functions/aggregates. But I still do think that "full window frame"
    and "reduced window frame" are important concept to explain/understand
    how PRP works.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  27. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-07-28T08:56:26Z

    On 7/28/23 09:09, Tatsuo Ishii wrote:
    >>> We already recalculate a frame each time a row is processed even
    >>> without RPR. See ExecWindowAgg.
    >>
    >> Yes, after each row.  Not for each function.
    > 
    > Ok, I understand now. Closer look at the code, I realized that each
    > window function calls update_frameheadpos, which computes the frame
    > head position. But actually it checks winstate->framehead_valid and if
    > it's already true (probably by other window function), then it does
    > nothing.
    > 
    >>> Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
    >>> which means the frame head is changed each time current row position
    >>> changes.
    >>
    >> Off topic for now: I wonder why this restriction is in place and
    >> whether we should respect or ignore it.  That is a discussion for
    >> another time, though.
    > 
    > My guess is, it is because other than ROWS BETWEEN CURRENT ROW has
    > little or no meaning. Consider following example:
    
    Yes, that makes sense.
    
    >>>> I strongly disagree with this.  Window function do not need to know
    >>>> how the frame is defined, and indeed they should not.
    >>> We already break the rule by defining *support functions. See
    >>> windowfuncs.c.
    >> The support functions don't know anything about the frame, they just
    >> know when a window function is monotonically increasing and execution
    >> can either stop or be "passed through".
    > 
    > I see following code in window_row_number_support:
    > 
    > 		/*
    > 		 * The frame options can always become "ROWS BETWEEN UNBOUNDED
    > 		 * PRECEDING AND CURRENT ROW".  row_number() always just increments by
    > 		 * 1 with each row in the partition.  Using ROWS instead of RANGE
    > 		 * saves effort checking peer rows during execution.
    > 		 */
    > 		req->frameOptions = (FRAMEOPTION_NONDEFAULT |
    > 							 FRAMEOPTION_ROWS |
    > 							 FRAMEOPTION_START_UNBOUNDED_PRECEDING |
    > 							 FRAMEOPTION_END_CURRENT_ROW);
    > 
    > I think it not only knows about frame but it even changes the frame
    > options. This seems far from "don't know anything about the frame", no?
    
    That's the planner support function.  The row_number() function itself 
    is not even allowed to *have* a frame, per spec.  We allow it, but as 
    you can see from that support function, we completely replace it.
    
    So all of the partition-level window functions are not affected by RPR 
    anyway.
    
    >> I have two comments about this:
    >>
    >> It isn't just for convenience, it is for correctness.  The window
    >> functions do not need to know which rows they are *not* operating on.
    >>
    >> There is no such thing as a "full" or "reduced" frame.  The standard
    >> uses those terms to explain the difference between before and after
    >> RPR is applied, but window functions do not get to choose which frame
    >> they apply over.  They only ever apply over the reduced window frame.
    > 
    > I agree that "full window frame" and "reduced window frame" do not
    > exist at the same time, and in the end (after computation of reduced
    > frame), only "reduced" frame is visible to window
    > functions/aggregates. But I still do think that "full window frame"
    > and "reduced window frame" are important concept to explain/understand
    > how PRP works.
    
    If we are just using those terms for documentation, then okay.
    -- 
    Vik Fearing
    
    
    
    
    
  28. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-07-28T09:21:25Z

    On 7/26/23 14:21, Tatsuo Ishii wrote:
    > Attached is the v3 patch. In this patch following changes are made.
    
    Excellent.  Thanks!
    
    A few quick comments:
    
    - PERMUTE is still misspelled as PREMUTE
    
    - PATTERN variables do not have to exist in the DEFINE clause.  They are 
    considered TRUE if not present.
    
    > (1) I completely changed the pattern matching engine so that it
    > performs backtracking. Now the engine evaluates all pattern elements
    > defined in PATTER against each row, saving matched pattern variables
    > in a string per row. For example if the pattern element A and B
    > evaluated to true, a string "AB" is created for current row.
    > 
    > This continues until all pattern matching fails or encounters the end
    > of full window frame/partition. After that, the pattern matching
    > engine creates all possible "pattern strings" and apply the regular
    > expression matching to each. For example if we have row 0 = "AB" row 1
    > = "C", possible pattern strings are: "AC" and "BC".
    > 
    > If it matches, the length of matching substring is saved. After all
    > possible trials are done, the longest matching substring is chosen and
    > it becomes the width (number of rows) in the reduced window frame.
    > 
    > See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
    > in nodeWindowAggs.c for more details. For now I use a naive depth
    > first search and probably there is a lot of rooms for optimization
    > (for example rewriting it without using
    > recursion). Suggestions/patches are welcome.
    
    My own reviews will only focus on correctness for now.  Once we get a 
    good set of regression tests all passing, I will focus more on 
    optimization.  Of course, others might want to review the performance now.
    
    > Vik Fearing wrote:
    >> I strongly disagree with this.  Window function do not need to know
    >> how the frame is defined, and indeed they should not.
    >> WinGetFuncArgInFrame should answer yes or no and the window function
    >> just works on that. Otherwise we will get extension (and possibly even
    >> core) functions that don't handle the frame properly.
    > 
    > So I moved row_is_in_reduce_frame into WinGetFuncArgInFrame so that
    > those window functions are not needed to be changed.
    > 
    > (3) Window function rpr was removed. We can use first_value instead.
    
    Excellent.
    
    > (4) Remaining tasks/issues.
    > 
    > - For now I disable WinSetMarkPosition because RPR often needs to
    >    access a row before the mark is set. We need to fix this in the
    >    future.
    
    Noted, and agreed.
    
    > - I am working on making window aggregates RPR aware now. The
    >    implementation is in progress and far from completeness. An example
    >    is below. I think row 2, 3, 4 of "count" column should be NULL
    >    instead of 3, 2, 0, 0. Same thing can be said to other
    >    rows. Probably this is an effect of moving aggregate but I still
    >    studying the window aggregation code.
    
    This tells me again that RPR is not being run in the right place.  All 
    windowed aggregates and frame-level window functions should Just Work 
    with no modification.
    
    > SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM stock
    >   WINDOW w AS (
    >   PARTITION BY company
    >   ORDER BY tdate
    >   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >   AFTER MATCH SKIP PAST LAST ROW
    >   INITIAL
    >   PATTERN (START UP+ DOWN+)
    >   DEFINE
    >    START AS TRUE,
    >    UP AS price > PREV(price),
    >    DOWN AS price < PREV(price)
    > );
    >   company  |   tdate    | first_value | count
    > ----------+------------+-------------+-------
    >   company1 | 2023-07-01 |         100 |     4
    >   company1 | 2023-07-02 |             |     3
    >   company1 | 2023-07-03 |             |     2
    >   company1 | 2023-07-04 |             |     0
    >   company1 | 2023-07-05 |             |     0
    >   company1 | 2023-07-06 |          90 |     4
    >   company1 | 2023-07-07 |             |     3
    >   company1 | 2023-07-08 |             |     2
    >   company1 | 2023-07-09 |             |     0
    >   company1 | 2023-07-10 |             |     0
    >   company2 | 2023-07-01 |          50 |     4
    >   company2 | 2023-07-02 |             |     3
    >   company2 | 2023-07-03 |             |     2
    >   company2 | 2023-07-04 |             |     0
    >   company2 | 2023-07-05 |             |     0
    >   company2 | 2023-07-06 |          60 |     4
    >   company2 | 2023-07-07 |             |     3
    >   company2 | 2023-07-08 |             |     2
    >   company2 | 2023-07-09 |             |     0
    >   company2 | 2023-07-10 |             |     0
    
    In this scenario, row 1's frame is the first 5 rows and specified SKIP 
    PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are 
    skipped) and the result of the outer count should be 0 for all of them 
    because there are no rows in the frame.
    
    When we get to adding count in the MEASURES clause, there will be a 
    difference between no match and empty match, but that does not apply here.
    
    > I am going to add this thread to CommitFest and plan to add both of
    > you as reviewers. Thanks in advance.
    
    My pleasure.  Thank you for working on this difficult feature.
    -- 
    Vik Fearing
    
    
    
    
    
  29. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-28T11:02:30Z

    >> Attached is the v3 patch. In this patch following changes are made.
    > 
    > Excellent.  Thanks!
    
    You are welcome!
    
    > A few quick comments:
    > 
    > - PERMUTE is still misspelled as PREMUTE
    
    Oops. Will fix.
    
    > - PATTERN variables do not have to exist in the DEFINE clause.  They are
    > - considered TRUE if not present.
    
    Do you think we really need this? I found a criticism regarding this.
    
    https://link.springer.com/article/10.1007/s13222-022-00404-3
    "3.2 Explicit Definition of All Row Pattern Variables"
    
    What do you think?
    
    >> - I am working on making window aggregates RPR aware now. The
    >>    implementation is in progress and far from completeness. An example
    >>    is below. I think row 2, 3, 4 of "count" column should be NULL
    >>    instead of 3, 2, 0, 0. Same thing can be said to other
    >>    rows. Probably this is an effect of moving aggregate but I still
    >>    studying the window aggregation code.
    > 
    > This tells me again that RPR is not being run in the right place.  All
    > windowed aggregates and frame-level window functions should Just Work
    > with no modification.
    
    I am not touching each aggregate function. I am modifying
    eval_windowaggregates() in nodeWindowAgg.c, which calls each aggregate
    function. Do you think it's not the right place to make window
    aggregates RPR aware?
    
    >> SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM
    >> stock
    >>   WINDOW w AS (
    >>   PARTITION BY company
    >>   ORDER BY tdate
    >>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >>   AFTER MATCH SKIP PAST LAST ROW
    >>   INITIAL
    >>   PATTERN (START UP+ DOWN+)
    >>   DEFINE
    >>    START AS TRUE,
    >>    UP AS price > PREV(price),
    >>    DOWN AS price < PREV(price)
    >> );
    >>   company  |   tdate    | first_value | count
    >> ----------+------------+-------------+-------
    >>   company1 | 2023-07-01 |         100 |     4
    >>   company1 | 2023-07-02 |             |     3
    >>   company1 | 2023-07-03 |             |     2
    >>   company1 | 2023-07-04 |             |     0
    >>   company1 | 2023-07-05 |             |     0
    >>   company1 | 2023-07-06 |          90 |     4
    >>   company1 | 2023-07-07 |             |     3
    >>   company1 | 2023-07-08 |             |     2
    >>   company1 | 2023-07-09 |             |     0
    >>   company1 | 2023-07-10 |             |     0
    >>   company2 | 2023-07-01 |          50 |     4
    >>   company2 | 2023-07-02 |             |     3
    >>   company2 | 2023-07-03 |             |     2
    >>   company2 | 2023-07-04 |             |     0
    >>   company2 | 2023-07-05 |             |     0
    >>   company2 | 2023-07-06 |          60 |     4
    >>   company2 | 2023-07-07 |             |     3
    >>   company2 | 2023-07-08 |             |     2
    >>   company2 | 2023-07-09 |             |     0
    >>   company2 | 2023-07-10 |             |     0
    > 
    > In this scenario, row 1's frame is the first 5 rows and specified SKIP
    > PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are
    > skipped) and the result of the outer count should be 0 for all of them
    > because there are no rows in the frame.
    
    Ok. Just I want to make sure. If it's other aggregates like sum or
    avg, the result of the outer aggregates should be NULL.
    
    > When we get to adding count in the MEASURES clause, there will be a
    > difference between no match and empty match, but that does not apply
    > here.
    
    Can you elaborate more? I understand that "no match" and "empty match"
    are different things. But I do not understand how the difference
    affects the result of count.
    
    >> I am going to add this thread to CommitFest and plan to add both of
    >> you as reviewers. Thanks in advance.
    > 
    > My pleasure.  Thank you for working on this difficult feature.
    
    Thank you for accepting being registered as a reviewer. Your comments
    are really helpful.
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  30. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-07-28T12:36:58Z

    On 7/28/23 13:02, Tatsuo Ishii wrote:
    >>> Attached is the v3 patch. In this patch following changes are made.
    >>
    >> - PATTERN variables do not have to exist in the DEFINE clause.  They are
    >> - considered TRUE if not present.
    > 
    > Do you think we really need this? I found a criticism regarding this.
    > 
    > https://link.springer.com/article/10.1007/s13222-022-00404-3
    > "3.2 Explicit Definition of All Row Pattern Variables"
    > 
    > What do you think?
    
    I think that a large part of obeying the standard is to allow queries 
    from other engines to run the same on ours.  The standard does not 
    require the pattern variables to be defined and so there are certainly 
    queries out there without them, and that hurts migrating to PostgreSQL.
    
    >>> - I am working on making window aggregates RPR aware now. The
    >>>     implementation is in progress and far from completeness. An example
    >>>     is below. I think row 2, 3, 4 of "count" column should be NULL
    >>>     instead of 3, 2, 0, 0. Same thing can be said to other
    >>>     rows. Probably this is an effect of moving aggregate but I still
    >>>     studying the window aggregation code.
    >>
    >> This tells me again that RPR is not being run in the right place.  All
    >> windowed aggregates and frame-level window functions should Just Work
    >> with no modification.
    > 
    > I am not touching each aggregate function. I am modifying
    > eval_windowaggregates() in nodeWindowAgg.c, which calls each aggregate
    > function. Do you think it's not the right place to make window
    > aggregates RPR aware?
    
    Oh, okay.
    
    >>> SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM
    >>> stock
    >>>    WINDOW w AS (
    >>>    PARTITION BY company
    >>>    ORDER BY tdate
    >>>    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >>>    AFTER MATCH SKIP PAST LAST ROW
    >>>    INITIAL
    >>>    PATTERN (START UP+ DOWN+)
    >>>    DEFINE
    >>>     START AS TRUE,
    >>>     UP AS price > PREV(price),
    >>>     DOWN AS price < PREV(price)
    >>> );
    >>>    company  |   tdate    | first_value | count
    >>> ----------+------------+-------------+-------
    >>>    company1 | 2023-07-01 |         100 |     4
    >>>    company1 | 2023-07-02 |             |     3
    >>>    company1 | 2023-07-03 |             |     2
    >>>    company1 | 2023-07-04 |             |     0
    >>>    company1 | 2023-07-05 |             |     0
    >>>    company1 | 2023-07-06 |          90 |     4
    >>>    company1 | 2023-07-07 |             |     3
    >>>    company1 | 2023-07-08 |             |     2
    >>>    company1 | 2023-07-09 |             |     0
    >>>    company1 | 2023-07-10 |             |     0
    >>>    company2 | 2023-07-01 |          50 |     4
    >>>    company2 | 2023-07-02 |             |     3
    >>>    company2 | 2023-07-03 |             |     2
    >>>    company2 | 2023-07-04 |             |     0
    >>>    company2 | 2023-07-05 |             |     0
    >>>    company2 | 2023-07-06 |          60 |     4
    >>>    company2 | 2023-07-07 |             |     3
    >>>    company2 | 2023-07-08 |             |     2
    >>>    company2 | 2023-07-09 |             |     0
    >>>    company2 | 2023-07-10 |             |     0
    >>
    >> In this scenario, row 1's frame is the first 5 rows and specified SKIP
    >> PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are
    >> skipped) and the result of the outer count should be 0 for all of them
    >> because there are no rows in the frame.
    > 
    > Ok. Just I want to make sure. If it's other aggregates like sum or
    > avg, the result of the outer aggregates should be NULL.
    
    They all behave the same way as in a normal query when they receive no 
    rows as input.
    
    >> When we get to adding count in the MEASURES clause, there will be a
    >> difference between no match and empty match, but that does not apply
    >> here.
    > 
    > Can you elaborate more? I understand that "no match" and "empty match"
    > are different things. But I do not understand how the difference
    > affects the result of count.
    
    This query:
    
    SELECT v.a, wcnt OVER w, count(*) OVER w
    FROM (VALUES ('A')) AS v (a)
    WINDOW w AS (
       ORDER BY v.a
       MEASURES count(*) AS wcnt
       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
       PATTERN (B)
       DEFINE B AS B.a = 'B'
    )
    
    produces this result:
    
      a | wcnt | count
    ---+------+-------
      A |      |     0
    (1 row)
    
    Inside the window specification, *no match* was found and so all of the 
    MEASURES are null.  The count(*) in the target list however, still 
    exists and operates over zero rows.
    
    This very similar query:
    
    SELECT v.a, wcnt OVER w, count(*) OVER w
    FROM (VALUES ('A')) AS v (a)
    WINDOW w AS (
       ORDER BY v.a
       MEASURES count(*) AS wcnt
       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
       PATTERN (B?)
       DEFINE B AS B.a = 'B'
    )
    
    produces this result:
    
      a | wcnt | count
    ---+------+-------
      A |    0 |     0
    (1 row)
    
    In this case, the pattern is B? instead of just B, which produces an 
    *empty match* for the MEASURES to be applied over.
    -- 
    Vik Fearing
    
    
    
    
    
  31. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-29T03:05:08Z

    >>> - PATTERN variables do not have to exist in the DEFINE clause.  They are
    >>> - considered TRUE if not present.
    >> Do you think we really need this? I found a criticism regarding this.
    >> https://link.springer.com/article/10.1007/s13222-022-00404-3
    >> "3.2 Explicit Definition of All Row Pattern Variables"
    >> What do you think?
    > 
    > I think that a large part of obeying the standard is to allow queries
    > from other engines to run the same on ours.  The standard does not
    > require the pattern variables to be defined and so there are certainly
    > queries out there without them, and that hurts migrating to
    > PostgreSQL.
    
    Yeah, migration is good point. I agree we should have the feature.
    
    >>> When we get to adding count in the MEASURES clause, there will be a
    >>> difference between no match and empty match, but that does not apply
    >>> here.
    >> Can you elaborate more? I understand that "no match" and "empty match"
    >> are different things. But I do not understand how the difference
    >> affects the result of count.
    > 
    > This query:
    > 
    > SELECT v.a, wcnt OVER w, count(*) OVER w
    > FROM (VALUES ('A')) AS v (a)
    > WINDOW w AS (
    >   ORDER BY v.a
    >   MEASURES count(*) AS wcnt
    >   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >   PATTERN (B)
    >   DEFINE B AS B.a = 'B'
    > )
    > 
    > produces this result:
    > 
    >  a | wcnt | count
    > ---+------+-------
    >  A |      |     0
    > (1 row)
    > 
    > Inside the window specification, *no match* was found and so all of
    > the MEASURES are null.  The count(*) in the target list however, still
    > exists and operates over zero rows.
    > 
    > This very similar query:
    > 
    > SELECT v.a, wcnt OVER w, count(*) OVER w
    > FROM (VALUES ('A')) AS v (a)
    > WINDOW w AS (
    >   ORDER BY v.a
    >   MEASURES count(*) AS wcnt
    >   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >   PATTERN (B?)
    >   DEFINE B AS B.a = 'B'
    > )
    > 
    > produces this result:
    > 
    >  a | wcnt | count
    > ---+------+-------
    >  A |    0 |     0
    > (1 row)
    > 
    > In this case, the pattern is B? instead of just B, which produces an
    > *empty match* for the MEASURES to be applied over.
    
    Thank you for the detailed explanation. I think I understand now.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  32. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-08-09T08:41:12Z

    Attached is the v4 patch. Differences from previous patch include:
    
    > - PERMUTE is still misspelled as PREMUTE
    
    Fixed.
    
    > - PATTERN variables do not have to exist in the DEFINE clause.  They are
    > - considered TRUE if not present.
    
    Fixed. Moreover new regression test case is added.
    
    - It was possible that tle nodes in DEFINE clause do not appear in the
      plan's target list. This makes impossible to evaluate expressions in
      the DEFINE because it does not appear in the outer plan's target
      list. To fix this, call findTargetlistEntrySQL99 (with resjunk is
      true) so that the missing TargetEntry is added to the outer plan
      later on.
    
    - I eliminated some hacks in handling the Var node in DEFINE
      clause. Previously I replaced varattno of Var node in a plan tree by
      hand so that it refers to correct varattno in the outer plan
      node. In this patch I modified set_upper_references so that it calls
      fix_upper_expr for those Var nodes in the DEFINE clause. See v4-0003
      patch for more details.
    
    - I found a bug with pattern matching code. It creates a string for
      subsequent regular expression matching. It uses the initial letter
      of each define variable name. For example, if the varname is "foo",
      then "f" is used. Obviously this makes trouble if we have two or
      more variables starting with same "f" (e.g. "food"). To fix this, I
      assign [a-z] to each variable instead of its initial letter. However
      this way limits us not to have more than 26 variables. I hope 26 is
      enough for most use cases.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  33. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-02T06:52:35Z

    Attached is the v5 patch. Differences from previous patch include:
    
    * Resolve complaint from "PostgreSQL Patch Tester"
      https://commitfest.postgresql.org/44/4460/
    
    - Change gram.y to use PATTERN_P instead of PATTERN. Using PATTERN seems
      to make trouble with Visual Studio build.
    
    :
    :
    [10:07:57.853] FAILED: src/backend/parser/parser.a.p/meson-generated_.._gram.c.obj 
    [10:07:57.853] "cl" "-Isrc\backend\parser\parser.a.p" "-Isrc\backend\parser" "-I..\src\backend\parser" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "-DBUILDING_DLL" "/Fdsrc\backend\parser\parser.a.p\meson-generated_.._gram.c.pdb" /Fosrc/backend/parser/parser.a.p/meson-generated_.._gram.c.obj "/c" src/backend/parser/gram.c
    [10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2365: 'PATTERN': redefinition; previous definition was 'typedef'
    [10:07:57.860] C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\um\wingdi.h(1375): note: see declaration of 'PATTERN'
    [10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2086: 'yytokentype PATTERN': redefinition
    [10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): note: see declaration of 'PATTERN'
    [10:07:57.860] ninja: build stopped: subcommand failed.
    
    * Resolve complaint from "make headerscheck"
    
    - Change Windowapi.h and nodeWindowAgg.c to remove unecessary extern
      and public functions.
      
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  34. Re: Row pattern recognition

    Erik Rijkers <er@xs4all.nl> — 2023-09-02T18:04:02Z

    Op 9/2/23 om 08:52 schreef Tatsuo Ishii:
    
    > Attached is the v5 patch. Differences from previous patch include:
    > 
    
    Hi,
    
    The patches compile & tests run fine but this statement from the 
    documentation crashes an assert-enabled server:
    
    SELECT company, tdate, price, max(price) OVER w FROM stock
    WINDOW w AS (
    PARTITION BY company
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    AFTER MATCH SKIP PAST LAST ROW
    INITIAL
    PATTERN (LOWPRICE UP+ DOWN+)
    DEFINE
    LOWPRICE AS price <= 100,
    UP AS price > PREV(price),
    DOWN AS price < PREV(price)
    );
    server closed the connection unexpectedly
    	This probably means the server terminated abnormally
    	before or while processing the request.
    connection to server was lost
    
    
    Log file:
    
    TRAP: failed Assert("aggregatedupto_nonrestarted <= 
    winstate->aggregatedupto"), File: "nodeWindowAgg.c", Line: 1054, PID: 68975
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
    SELECT(ExceptionalCondition+0x54)[0x9b0824]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x71ae8d]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
    SELECT(standard_ExecutorRun+0x13a)[0x6def9a]
    /home/aardvark/pg_stuff/pg_installations/pgsql.rpr/lib/pg_stat_statements.so(+0x55e5)[0x7ff3798b95e5]
    /home/aardvark/pg_stuff/pg_installations/pgsql.rpr/lib/auto_explain.so(+0x2680)[0x7ff3798ab680]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x88a4ff]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
    SELECT(PortalRun+0x240)[0x88bb50]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x887cca]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
    SELECT(PostgresMain+0x14dc)[0x88958c]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x7fb0da]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
    SELECT(PostmasterMain+0xd2d)[0x7fc01d]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
    SELECT(main+0x1e0)[0x5286d0]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff378e9dd0a]
    postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
    SELECT(_start+0x2a)[0x5289aa]
    2023-09-02 19:59:05.329 CEST 46723 LOG:  server process (PID 68975) was 
    terminated by signal 6: Aborted
    2023-09-02 19:59:05.329 CEST 46723 DETAIL:  Failed process was running: 
    SELECT company, tdate, price, max(price) OVER w FROM stock
             WINDOW w AS (
             PARTITION BY company
             ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
             AFTER MATCH SKIP PAST LAST ROW
             INITIAL
             PATTERN (LOWPRICE UP+ DOWN+)
             DEFINE
             LOWPRICE AS price <= 100,
             UP AS price > PREV(price),
             DOWN AS price < PREV(price)
             );
    2023-09-02 19:59:05.329 CEST 46723 LOG:  terminating any other active 
    server processes
    
    
    
    Erik Rijkers
    
    
    
    
  35. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-03T00:03:44Z

    > Hi,
    > 
    > The patches compile & tests run fine but this statement from the
    > documentation crashes an assert-enabled server:
    > 
    > SELECT company, tdate, price, max(price) OVER w FROM stock
    > WINDOW w AS (
    > PARTITION BY company
    > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > AFTER MATCH SKIP PAST LAST ROW
    > INITIAL
    > PATTERN (LOWPRICE UP+ DOWN+)
    > DEFINE
    > LOWPRICE AS price <= 100,
    > UP AS price > PREV(price),
    > DOWN AS price < PREV(price)
    > );
    > server closed the connection unexpectedly
    > 	This probably means the server terminated abnormally
    > 	before or while processing the request.
    > connection to server was lost
    
    Thank you for the report. Currently the patch has an issue with
    aggregate functions including max. I have been working on aggregations
    in row pattern recognition but will take more time to complete the
    part.
    
    In the mean time if you want to play with RPR, you can try window
    functions. Examples can be found in src/test/regress/sql/rpr.sql.
    Here is one of this:
    
    -- the first row start with less than or equal to 100
    SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
     FROM stock
     WINDOW w AS (
     PARTITION BY company
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
     INITIAL
     PATTERN (LOWPRICE UP+ DOWN+)
     DEFINE
      LOWPRICE AS price <= 100,
      UP AS price > PREV(price),
      DOWN AS price < PREV(price)
    );
    
    -- second row raises 120%
    SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
     FROM stock
     WINDOW w AS (
     PARTITION BY company
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
     INITIAL
     PATTERN (LOWPRICE UP+ DOWN+)
     DEFINE
      LOWPRICE AS price <= 100,
      UP AS price > PREV(price) * 1.2,
      DOWN AS price < PREV(price)
    );
    
    Sorry for inconvenience.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  36. Re: Row pattern recognition

    Jacob Champion <jchampion@timescale.com> — 2023-09-08T00:00:07Z

    Hello!
    
    > (1) I completely changed the pattern matching engine so that it
    > performs backtracking. Now the engine evaluates all pattern elements
    > defined in PATTER against each row, saving matched pattern variables
    > in a string per row. For example if the pattern element A and B
    > evaluated to true, a string "AB" is created for current row.
    
    If I understand correctly, this strategy assumes that one row's
    membership in a pattern variable is independent of the other rows'
    membership. But I don't think that's necessarily true:
    
        DEFINE
          A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
          ...
    
    > See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
    > in nodeWindowAggs.c for more details. For now I use a naive depth
    > first search and probably there is a lot of rooms for optimization
    > (for example rewriting it without using
    > recursion).
    
    The depth-first match is doing a lot of subtle work here. For example, with
    
        PATTERN ( A+ B+ )
        DEFINE A AS TRUE,
               B AS TRUE
    
    (i.e. all rows match both variables), and three rows in the partition,
    our candidates will be tried in the order
    
        aaa
        aab
        aba
        abb
        ...
        bbb
    
    The only possible matches against our regex `^a+b+` are "aab" and "abb",
    and that order matches the preferment order, so it's fine. But it's easy
    to come up with a pattern where that's the wrong order, like
    
        PATTERN ( A+ (B|A)+ )
    
    Now "aaa" will be considered before "aab", which isn't correct.
    
    Similarly, the assumption that we want to match the longest string only
    works because we don't allow alternation yet.
    
    > Suggestions/patches are welcome.
    
    Cool, I will give this piece some more thought. Do you mind if I try to
    add some more complicated pattern quantifiers to stress the
    architecture, or would you prefer to tackle that later? Just alternation
    by itself will open up a world of corner cases.
    
    > With the new engine, cases above do not fail anymore. See new
    > regression test cases. Thanks for providing valuable test cases!
    
    You're very welcome!
    
    On 8/9/23 01:41, Tatsuo Ishii wrote:
    > - I found a bug with pattern matching code. It creates a string for
    >   subsequent regular expression matching. It uses the initial letter
    >   of each define variable name. For example, if the varname is "foo",
    >   then "f" is used. Obviously this makes trouble if we have two or
    >   more variables starting with same "f" (e.g. "food"). To fix this, I
    >   assign [a-z] to each variable instead of its initial letter. However
    >   this way limits us not to have more than 26 variables. I hope 26 is
    >   enough for most use cases.
    
    There are still plenty of alphanumerics left that could be assigned...
    
    But I'm wondering if we might want to just implement the NFA directly?
    The current implementation's Cartesian explosion can probably be pruned
    aggressively, but replaying the entire regex match once for every
    backtracked step will still duplicate a lot of work.
    
    --
    
    I've attached another test case; it looks like last_value() is depending
    on some sort of side effect from either first_value() or nth_value(). I
    know the window frame itself is still under construction, so apologies
    if this is an expected failure.
    
    Thanks!
    --Jacob
  37. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-08T03:54:47Z

    Hi,
    
    > Hello!
    > 
    >> (1) I completely changed the pattern matching engine so that it
    >> performs backtracking. Now the engine evaluates all pattern elements
    >> defined in PATTER against each row, saving matched pattern variables
    >> in a string per row. For example if the pattern element A and B
    >> evaluated to true, a string "AB" is created for current row.
    > 
    > If I understand correctly, this strategy assumes that one row's
    > membership in a pattern variable is independent of the other rows'
    > membership. But I don't think that's necessarily true:
    > 
    >     DEFINE
    >       A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
    >       ...
    
    But:
    
    UP AS price > PREV(price)
    
    also depends on previous row, no? Can you please elaborate how your
    example could break current implementation? I cannot test it because
    CLASSIFIER is not implemented yet.
    
    >> See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
    >> in nodeWindowAggs.c for more details. For now I use a naive depth
    >> first search and probably there is a lot of rooms for optimization
    >> (for example rewriting it without using
    >> recursion).
    > 
    > The depth-first match is doing a lot of subtle work here. For example, with
    > 
    >     PATTERN ( A+ B+ )
    >     DEFINE A AS TRUE,
    >            B AS TRUE
    > 
    > (i.e. all rows match both variables), and three rows in the partition,
    > our candidates will be tried in the order
    > 
    >     aaa
    >     aab
    >     aba
    >     abb
    >     ...
    >     bbb
    > 
    > The only possible matches against our regex `^a+b+` are "aab" and "abb",
    > and that order matches the preferment order, so it's fine. But it's easy
    > to come up with a pattern where that's the wrong order, like
    > 
    >     PATTERN ( A+ (B|A)+ )
    > 
    > Now "aaa" will be considered before "aab", which isn't correct.
    
    Can you explain a little bit more? I think 'aaa' matches a regular
    expression 'a+(b|a)+' and should be no problem before "aab" is
    considered.
    
    > Similarly, the assumption that we want to match the longest string only
    > works because we don't allow alternation yet.
    
    Can you please clarify more on this?
    
    > Cool, I will give this piece some more thought. Do you mind if I try to
    > add some more complicated pattern quantifiers to stress the
    > architecture, or would you prefer to tackle that later? Just alternation
    > by itself will open up a world of corner cases.
    
    Do you mean you want to provide a better patch for the pattern
    matching part? That will be helpfull. Because I am currently working
    on the aggregation part and have no time to do it. However, the
    aggregation work affects the v5 patch: it needs a refactoring. So can
    you wait until I release v6 patch? I hope it will be released in two
    weeks or so.
    
    > On 8/9/23 01:41, Tatsuo Ishii wrote:
    >> - I found a bug with pattern matching code. It creates a string for
    >>   subsequent regular expression matching. It uses the initial letter
    >>   of each define variable name. For example, if the varname is "foo",
    >>   then "f" is used. Obviously this makes trouble if we have two or
    >>   more variables starting with same "f" (e.g. "food"). To fix this, I
    >>   assign [a-z] to each variable instead of its initial letter. However
    >>   this way limits us not to have more than 26 variables. I hope 26 is
    >>   enough for most use cases.
    > 
    > There are still plenty of alphanumerics left that could be assigned...
    > 
    > But I'm wondering if we might want to just implement the NFA directly?
    > The current implementation's Cartesian explosion can probably be pruned
    > aggressively, but replaying the entire regex match once for every
    > backtracked step will still duplicate a lot of work.
    
    Not sure if you mean implementing new regular expression engine
    besides src/backend/regexp. I am afraid it's not a trivial work. The
    current regexp code consists of over 10k lines. What do you think?
    
    > I've attached another test case; it looks like last_value() is depending
    > on some sort of side effect from either first_value() or nth_value(). I
    > know the window frame itself is still under construction, so apologies
    > if this is an expected failure.
    
    Thanks. Fortunately current code which I am working passes the new
    test. I will include it in the next (v6) patch.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  38. Re: Row pattern recognition

    Jacob Champion <jchampion@timescale.com> — 2023-09-08T19:27:05Z

    On 9/7/23 20:54, Tatsuo Ishii wrote:
    >>     DEFINE
    >>       A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
    >>       ...
    > 
    > But:
    > 
    > UP AS price > PREV(price)
    > 
    > also depends on previous row, no?
    
    PREV(CLASSIFIER()) depends not on the value of the previous row but the
    state of the match so far. To take an example from the patch:
    
    > * Example:
    > * str_set[0] = "AB";
    > * str_set[1] = "AC";
    > * In this case at row 0 A and B are true, and A and C are true in row 1.
    
    With these str_sets and my example DEFINE, row[1] is only classifiable
    as 'A' if row[0] is *not* classified as 'A' at this point in the match.
    "AA" is not a valid candidate string, even if it matches the PATTERN.
    
    So if we don't reevaluate the pattern variable condition for the row, we
    at least have to prune the combinations that search_str_set() visits, so
    that we don't generate a logically impossible combination. That seems
    like it could be pretty fragile, and it may be difficult for us to prove
    compliance.
    
    >> But it's easy
    >> to come up with a pattern where that's the wrong order, like
    >>
    >>     PATTERN ( A+ (B|A)+ )
    >>
    >> Now "aaa" will be considered before "aab", which isn't correct.
    > 
    > Can you explain a little bit more? I think 'aaa' matches a regular
    > expression 'a+(b|a)+' and should be no problem before "aab" is
    > considered.
    
    Assuming I've understood the rules correctly, we're not allowed to
    classify the last row as 'A' if it also matches 'B'. Lexicographic
    ordering takes precedence, so we have to try "aab" first. Otherwise our
    query could return different results compared to another implementation.
    
    >> Similarly, the assumption that we want to match the longest string only
    >> works because we don't allow alternation yet.
    > 
    > Can you please clarify more on this?
    
    Sure: for the pattern
    
        PATTERN ( (A|B)+ )
    
    we have to consider the candidate "a" over the candidate "ba", even
    though the latter is longer. Like the prior example, lexicographic
    ordering is considered more important than the greedy quantifier.
    Quoting ISO/IEC 9075-2:2016:
    
        More precisely, with both reluctant and greedy quantifiers, the set
        of matches is ordered lexicographically, but when one match is an
        initial substring of another match, reluctant quantifiers prefer the
        shorter match (the substring), whereas greedy quantifiers prefer the
        longer match (the “superstring”).
    
    Here, "ba" doesn't have "a" as a prefix, so "ba" doesn't get priority.
    ISO/IEC 19075-5:2021 has a big section on this (7.2) with worked examples.
    
    (The "lexicographic order matters more than greediness" concept was the
    most mind-bending part for me so far, probably because I haven't figured
    out how to translate the concept into POSIX EREs. It wouldn't make sense
    to say "the letter 't' can match 'a', 'B', or '3' in this regex", but
    that's what RPR is doing.)
    
    >> Cool, I will give this piece some more thought. Do you mind if I try to
    >> add some more complicated pattern quantifiers to stress the
    >> architecture, or would you prefer to tackle that later? Just alternation
    >> by itself will open up a world of corner cases.
    > 
    > Do you mean you want to provide a better patch for the pattern
    > matching part? That will be helpfull.
    
    No guarantees that I'll find a better patch :D But yes, I will give it a
    try.
    
    > Because I am currently working
    > on the aggregation part and have no time to do it. However, the
    > aggregation work affects the v5 patch: it needs a refactoring. So can
    > you wait until I release v6 patch? I hope it will be released in two
    > weeks or so.
    
    Absolutely!
    
    >> But I'm wondering if we might want to just implement the NFA directly?
    >> The current implementation's Cartesian explosion can probably be pruned
    >> aggressively, but replaying the entire regex match once for every
    >> backtracked step will still duplicate a lot of work.
    > 
    > Not sure if you mean implementing new regular expression engine
    > besides src/backend/regexp. I am afraid it's not a trivial work. The
    > current regexp code consists of over 10k lines. What do you think?
    
    Heh, I think it would be pretty foolish for me to code an NFA, from
    scratch, and then try to convince the community to maintain it.
    
    But:
    - I think we have to implement a parallel parser regardless (RPR PATTERN
    syntax looks incompatible with POSIX)
    - I suspect we need more control over the backtracking than the current
    pg_reg* API is going to give us, or else I'm worried performance is
    going to fall off a cliff with usefully-large partitions
    - there's a lot of stuff in POSIX EREs that we don't need, and of the
    features we do need, the + quantifier is probably one of the easiest
    - it seems easier to prove the correctness of a slow, naive,
    row-at-a-time engine, because we can compare it directly to the spec
    
    So what I'm thinking is: if I start by open-coding the + quantifier, and
    slowly add more pieces in, then it might be easier to see the parts of
    src/backend/regex that I've duplicated. We can try to expose those parts
    directly from the internal API to replace my bad implementation. And if
    there are parts that aren't duplicated, then it'll be easier to explain
    why we need something different from the current engine.
    
    Does that seem like a workable approach? (Worst-case, my code is just
    horrible, and we throw it in the trash.)
    
    >> I've attached another test case; it looks like last_value() is depending
    >> on some sort of side effect from either first_value() or nth_value(). I
    >> know the window frame itself is still under construction, so apologies
    >> if this is an expected failure.
    > 
    > Thanks. Fortunately current code which I am working passes the new
    > test. I will include it in the next (v6) patch.
    
    Great!
    
    Thanks,
    --Jacob
    
    
    
    
  39. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-09-08T21:43:10Z

    On 9/8/23 21:27, Jacob Champion wrote:
    > On 9/7/23 20:54, Tatsuo Ishii wrote:
    
    >>> But it's easy
    >>> to come up with a pattern where that's the wrong order, like
    >>>
    >>>      PATTERN ( A+ (B|A)+ )
    >>>
    >>> Now "aaa" will be considered before "aab", which isn't correct.
    >>
    >> Can you explain a little bit more? I think 'aaa' matches a regular
    >> expression 'a+(b|a)+' and should be no problem before "aab" is
    >> considered.
    > 
    > Assuming I've understood the rules correctly, we're not allowed to
    > classify the last row as 'A' if it also matches 'B'. Lexicographic
    > ordering takes precedence, so we have to try "aab" first. Otherwise our
    > query could return different results compared to another implementation.
    
    
    Your understanding is correct.
    -- 
    Vik Fearing
    
    
    
    
    
  40. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-09T11:21:21Z

    Hi,
    
    >> But:
    >> 
    >> UP AS price > PREV(price)
    >> 
    >> also depends on previous row, no?
    > 
    > PREV(CLASSIFIER()) depends not on the value of the previous row but the
    > state of the match so far. To take an example from the patch:
    > 
    >> * Example:
    >> * str_set[0] = "AB";
    >> * str_set[1] = "AC";
    >> * In this case at row 0 A and B are true, and A and C are true in row 1.
    > 
    > With these str_sets and my example DEFINE, row[1] is only classifiable
    > as 'A' if row[0] is *not* classified as 'A' at this point in the match.
    > "AA" is not a valid candidate string, even if it matches the PATTERN.
    
    Ok, Let me clarify my understanding. Suppose we have:
    
    PATTER (A B)
    DEFINE A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
    B AS price > 100
    
    and the target table has price column values:
    
    row[0]: 110
    row[1]: 110
    row[2]: 110
    row[3]: 110
    
    Then we will get for str_set:
    r0: B
    r1: AB
    
    Because r0 only has classifier B, r1 can have A and B.  Problem is,
    r2. If we choose A at r1, then r2 = B. But if we choose B at t1, then
    r2 = AB. I guess this is the issue you pointed out.
    
    > So if we don't reevaluate the pattern variable condition for the row, we
    > at least have to prune the combinations that search_str_set() visits, so
    > that we don't generate a logically impossible combination. That seems
    > like it could be pretty fragile, and it may be difficult for us to prove
    > compliance.
    
    Yeah, probably we have delay evaluation of such pattern variables like
    A, then reevaluate A after the first scan.
    
    What about leaving this (reevaluation) for now? Because:
    
    1) we don't have CLASSIFIER
    2) we don't allow to give CLASSIFIER to PREV as its arggument
    
    so I think we don't need to worry about this for now.
    
    >> Can you explain a little bit more? I think 'aaa' matches a regular
    >> expression 'a+(b|a)+' and should be no problem before "aab" is
    >> considered.
    > 
    > Assuming I've understood the rules correctly, we're not allowed to
    > classify the last row as 'A' if it also matches 'B'. Lexicographic
    > ordering takes precedence, so we have to try "aab" first. Otherwise our
    > query could return different results compared to another implementation.
    > 
    >>> Similarly, the assumption that we want to match the longest string only
    >>> works because we don't allow alternation yet.
    >> 
    >> Can you please clarify more on this?
    > 
    > Sure: for the pattern
    > 
    >     PATTERN ( (A|B)+ )
    > 
    > we have to consider the candidate "a" over the candidate "ba", even
    > though the latter is longer. Like the prior example, lexicographic
    > ordering is considered more important than the greedy quantifier.
    > Quoting ISO/IEC 9075-2:2016:
    > 
    >     More precisely, with both reluctant and greedy quantifiers, the set
    >     of matches is ordered lexicographically, but when one match is an
    >     initial substring of another match, reluctant quantifiers prefer the
    >     shorter match (the substring), whereas greedy quantifiers prefer the
    >     longer match (the “superstring”).
    > 
    > Here, "ba" doesn't have "a" as a prefix, so "ba" doesn't get priority.
    > ISO/IEC 19075-5:2021 has a big section on this (7.2) with worked examples.
    > 
    > (The "lexicographic order matters more than greediness" concept was the
    > most mind-bending part for me so far, probably because I haven't figured
    > out how to translate the concept into POSIX EREs. It wouldn't make sense
    > to say "the letter 't' can match 'a', 'B', or '3' in this regex", but
    > that's what RPR is doing.)
    
    Thanks for the explanation.  Surprising concet of the standard:-) Is
    it different from SIMILAR TO REs too?
    
    What if we don't follow the standard, instead we follow POSIX EREs?  I
    think this is better for users unless RPR's REs has significant merit
    for users.
    
    >> Do you mean you want to provide a better patch for the pattern
    >> matching part? That will be helpfull.
    > 
    > No guarantees that I'll find a better patch :D But yes, I will give it a
    > try.
    
    Ok.
    
    >> Because I am currently working
    >> on the aggregation part and have no time to do it. However, the
    >> aggregation work affects the v5 patch: it needs a refactoring. So can
    >> you wait until I release v6 patch? I hope it will be released in two
    >> weeks or so.
    > 
    > Absolutely!
    
    Thanks.
    
    > Heh, I think it would be pretty foolish for me to code an NFA, from
    > scratch, and then try to convince the community to maintain it.
    > 
    > But:
    > - I think we have to implement a parallel parser regardless (RPR PATTERN
    > syntax looks incompatible with POSIX)
    
    I am not sure if we need to worry about this because of the reason I
    mentioned above.
    
    > - I suspect we need more control over the backtracking than the current
    > pg_reg* API is going to give us, or else I'm worried performance is
    > going to fall off a cliff with usefully-large partitions
    
    Agreed.
    
    > - there's a lot of stuff in POSIX EREs that we don't need, and of the
    > features we do need, the + quantifier is probably one of the easiest
    > - it seems easier to prove the correctness of a slow, naive,
    > row-at-a-time engine, because we can compare it directly to the spec
    > 
    > So what I'm thinking is: if I start by open-coding the + quantifier, and
    > slowly add more pieces in, then it might be easier to see the parts of
    > src/backend/regex that I've duplicated. We can try to expose those parts
    > directly from the internal API to replace my bad implementation. And if
    > there are parts that aren't duplicated, then it'll be easier to explain
    > why we need something different from the current engine.
    > 
    > Does that seem like a workable approach? (Worst-case, my code is just
    > horrible, and we throw it in the trash.)
    
    Yes, it seems workable. I think for the first cut of RPR needs at
    least the +quantifier with reasonable performance. The current naive
    implementation seems to have issue because of exhaustive search.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  41. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-09-09T13:32:41Z

    On 9/9/23 13:21, Tatsuo Ishii wrote:
    > Thanks for the explanation.  Surprising concet of the standard:-)
    
    <quote from 19075-5:2023>
    
    This leaves the choice between traditional NFA and Posix NFA. The 
    difference between these is that a traditional NFA exits (declares a 
    match) as soon as it finds the first possible match, whereas a Posix NFA 
    is obliged to find all possible matches and then choose the “leftmost 
    longest”. There are examples that show that, even for conventional 
    regular expression matching on text strings and without back references, 
    there are patterns for which a Posix NFA is orders of magnitude slower 
    than a traditional NFA. In addition, reluctant quantifiers cannot be 
    defined in a Posix NFA, because of the leftmost longest rule.
    
    Therefore it was decided not to use the Posix NFA model, which leaves 
    the traditional NFA as the model for row pattern matching. Among 
    available tools that use traditional NFA engines, Perl is the most 
    influential; therefore Perl was adopted as the design target for pattern 
    matching rules.
    
    </quote>
    
    > Is it different from SIMILAR TO REs too?
    
    Of course it is. :-)  SIMILAR TO uses its own language and rules.
    
    > What if we don't follow the standard, instead we follow POSIX EREs?  I
    > think this is better for users unless RPR's REs has significant merit
    > for users.
    
    This would get big pushback from me.
    -- 
    Vik Fearing
    
    
    
    
    
  42. Re: Row pattern recognition

    Jacob Champion <jchampion@timescale.com> — 2023-09-11T22:13:43Z

    On Sat, Sep 9, 2023 at 4:21 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    > Then we will get for str_set:
    > r0: B
    > r1: AB
    >
    > Because r0 only has classifier B, r1 can have A and B.  Problem is,
    > r2. If we choose A at r1, then r2 = B. But if we choose B at t1, then
    > r2 = AB. I guess this is the issue you pointed out.
    
    Right.
    
    > Yeah, probably we have delay evaluation of such pattern variables like
    > A, then reevaluate A after the first scan.
    >
    > What about leaving this (reevaluation) for now? Because:
    >
    > 1) we don't have CLASSIFIER
    > 2) we don't allow to give CLASSIFIER to PREV as its arggument
    >
    > so I think we don't need to worry about this for now.
    
    Sure. I'm all for deferring features to make it easier to iterate; I
    just want to make sure the architecture doesn't hit a dead end. Or at
    least, not without being aware of it.
    
    Also: is CLASSIFIER the only way to run into this issue?
    
    > What if we don't follow the standard, instead we follow POSIX EREs?  I
    > think this is better for users unless RPR's REs has significant merit
    > for users.
    
    Piggybacking off of what Vik wrote upthread, I think we would not be
    doing ourselves any favors by introducing a non-compliant
    implementation that performs worse than a traditional NFA. Those would
    be some awful bug reports.
    
    > > - I think we have to implement a parallel parser regardless (RPR PATTERN
    > > syntax looks incompatible with POSIX)
    >
    > I am not sure if we need to worry about this because of the reason I
    > mentioned above.
    
    Even if we adopted POSIX NFA semantics, we'd still have to implement
    our own parser for the PATTERN part of the query. I don't think
    there's a good way for us to reuse the parser in src/backend/regex.
    
    > > Does that seem like a workable approach? (Worst-case, my code is just
    > > horrible, and we throw it in the trash.)
    >
    > Yes, it seems workable. I think for the first cut of RPR needs at
    > least the +quantifier with reasonable performance. The current naive
    > implementation seems to have issue because of exhaustive search.
    
    +1
    
    Thanks!
    --Jacob
    
    
    
    
  43. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-12T06:18:43Z

    >> What about leaving this (reevaluation) for now? Because:
    >>
    >> 1) we don't have CLASSIFIER
    >> 2) we don't allow to give CLASSIFIER to PREV as its arggument
    >>
    >> so I think we don't need to worry about this for now.
    > 
    > Sure. I'm all for deferring features to make it easier to iterate; I
    > just want to make sure the architecture doesn't hit a dead end. Or at
    > least, not without being aware of it.
    
    Ok, let's defer this issue. Currently the patch already exceeds 3k
    lines. I am afraid too big patch cannot be reviewed by anyone, which
    means it will never be committed.
    
    > Also: is CLASSIFIER the only way to run into this issue?
    
    Good question. I would like to know.
    
    >> What if we don't follow the standard, instead we follow POSIX EREs?  I
    >> think this is better for users unless RPR's REs has significant merit
    >> for users.
    > 
    > Piggybacking off of what Vik wrote upthread, I think we would not be
    > doing ourselves any favors by introducing a non-compliant
    > implementation that performs worse than a traditional NFA. Those would
    > be some awful bug reports.
    
    What I am not sure about is, you and Vik mentioned that the
    traditional NFA is superior that POSIX NFA in terms of performance.
    But how "lexicographic ordering" is related to performance?
    
    >> I am not sure if we need to worry about this because of the reason I
    >> mentioned above.
    > 
    > Even if we adopted POSIX NFA semantics, we'd still have to implement
    > our own parser for the PATTERN part of the query. I don't think
    > there's a good way for us to reuse the parser in src/backend/regex.
    
    Ok.
    
    >> > Does that seem like a workable approach? (Worst-case, my code is just
    >> > horrible, and we throw it in the trash.)
    >>
    >> Yes, it seems workable. I think for the first cut of RPR needs at
    >> least the +quantifier with reasonable performance. The current naive
    >> implementation seems to have issue because of exhaustive search.
    > 
    > +1
    
    BTW, attched is the v6 patch. The differences from v5 include:
    
    - Now aggregates can be used with RPR. Below is an example from the
      regression test cases, which is added by v6 patch.
    
    - Fix assersion error pointed out by Erik.
    
    SELECT company, tdate, price,
     first_value(price) OVER w,
     last_value(price) OVER w,
     max(price) OVER w,
     min(price) OVER w,
     sum(price) OVER w,
     avg(price) OVER w,
     count(price) OVER w
    FROM stock
    WINDOW w AS (
    PARTITION BY company
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    AFTER MATCH SKIP PAST LAST ROW
    INITIAL
    PATTERN (START UP+ DOWN+)
    DEFINE
    START AS TRUE,
    UP AS price > PREV(price),
    DOWN AS price < PREV(price)
    );
     company  |   tdate    | price | first_value | last_value | max  | min | sum  |          avg          | count 
    ----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
     company1 | 07-01-2023 |   100 |         100 |        140 |  200 | 100 |  590 |  147.5000000000000000 |     4
     company1 | 07-02-2023 |   200 |             |            |      |     |      |                       |      
     company1 | 07-03-2023 |   150 |             |            |      |     |      |                       |      
     company1 | 07-04-2023 |   140 |             |            |      |     |      |                       |      
     company1 | 07-05-2023 |   150 |             |            |      |     |      |                       |      
     company1 | 07-06-2023 |    90 |          90 |        120 |  130 |  90 |  450 |  112.5000000000000000 |     4
     company1 | 07-07-2023 |   110 |             |            |      |     |      |                       |      
     company1 | 07-08-2023 |   130 |             |            |      |     |      |                       |      
     company1 | 07-09-2023 |   120 |             |            |      |     |      |                       |      
     company1 | 07-10-2023 |   130 |             |            |      |     |      |                       |      
     company2 | 07-01-2023 |    50 |          50 |       1400 | 2000 |  50 | 4950 | 1237.5000000000000000 |     4
     company2 | 07-02-2023 |  2000 |             |            |      |     |      |                       |      
     company2 | 07-03-2023 |  1500 |             |            |      |     |      |                       |      
     company2 | 07-04-2023 |  1400 |             |            |      |     |      |                       |      
     company2 | 07-05-2023 |  1500 |             |            |      |     |      |                       |      
     company2 | 07-06-2023 |    60 |          60 |       1200 | 1300 |  60 | 3660 |  915.0000000000000000 |     4
     company2 | 07-07-2023 |  1100 |             |            |      |     |      |                       |      
     company2 | 07-08-2023 |  1300 |             |            |      |     |      |                       |      
     company2 | 07-09-2023 |  1200 |             |            |      |     |      |                       |      
     company2 | 07-10-2023 |  1300 |             |            |      |     |      |                       |      
    (20 rows)
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  44. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-12T08:44:57Z

    Regarding v6 patch:
    
    > SELECT company, tdate, price,
    >  first_value(price) OVER w,
    >  last_value(price) OVER w,
    >  max(price) OVER w,
    >  min(price) OVER w,
    >  sum(price) OVER w,
    >  avg(price) OVER w,
    >  count(price) OVER w
    > FROM stock
    > WINDOW w AS (
    > PARTITION BY company
    > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > AFTER MATCH SKIP PAST LAST ROW
    > INITIAL
    > PATTERN (START UP+ DOWN+)
    > DEFINE
    > START AS TRUE,
    > UP AS price > PREV(price),
    > DOWN AS price < PREV(price)
    > );
    >  company  |   tdate    | price | first_value | last_value | max  | min | sum  |          avg          | count 
    > ----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
    >  company1 | 07-01-2023 |   100 |         100 |        140 |  200 | 100 |  590 |  147.5000000000000000 |     4
    >  company1 | 07-02-2023 |   200 |             |            |      |     |      |                       |      
    >  company1 | 07-03-2023 |   150 |             |            |      |     |      |                       |      
    >  company1 | 07-04-2023 |   140 |             |            |      |     |      |                       |      
    >  company1 | 07-05-2023 |   150 |             |            |      |     |      |                       |      
    >  company1 | 07-06-2023 |    90 |          90 |        120 |  130 |  90 |  450 |  112.5000000000000000 |     4
    >  company1 | 07-07-2023 |   110 |             |            |      |     |      |                       |      
    >  company1 | 07-08-2023 |   130 |             |            |      |     |      |                       |      
    >  company1 | 07-09-2023 |   120 |             |            |      |     |      |                       |      
    >  company1 | 07-10-2023 |   130 |             |            |      |     |      |                       |      
    >  company2 | 07-01-2023 |    50 |          50 |       1400 | 2000 |  50 | 4950 | 1237.5000000000000000 |     4
    >  company2 | 07-02-2023 |  2000 |             |            |      |     |      |                       |      
    >  company2 | 07-03-2023 |  1500 |             |            |      |     |      |                       |      
    >  company2 | 07-04-2023 |  1400 |             |            |      |     |      |                       |      
    >  company2 | 07-05-2023 |  1500 |             |            |      |     |      |                       |      
    >  company2 | 07-06-2023 |    60 |          60 |       1200 | 1300 |  60 | 3660 |  915.0000000000000000 |     4
    >  company2 | 07-07-2023 |  1100 |             |            |      |     |      |                       |      
    >  company2 | 07-08-2023 |  1300 |             |            |      |     |      |                       |      
    >  company2 | 07-09-2023 |  1200 |             |            |      |     |      |                       |      
    >  company2 | 07-10-2023 |  1300 |             |            |      |     |      |                       |      
    > (20 rows)
    
    count column for unmatched rows should have been 0, rather than
    NULL. i.e.
    
     company  |   tdate    | price | first_value | last_value | max  | min | sum  |          avg          | count 
    ----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
     company1 | 07-01-2023 |   100 |         100 |        140 |  200 | 100 |  590 |  147.5000000000000000 |     4
     company1 | 07-02-2023 |   200 |             |            |      |     |      |                       |      
     company1 | 07-03-2023 |   150 |             |            |      |     |      |                       |      
     company1 | 07-04-2023 |   140 |             |            |      |     |      |                       |      
     company1 | 07-05-2023 |   150 |             |            |      |     |      |                       |     0
     company1 | 07-06-2023 |    90 |          90 |        120 |  130 |  90 |  450 |  112.5000000000000000 |     4
     company1 | 07-07-2023 |   110 |             |            |      |     |      |                       |      
     company1 | 07-08-2023 |   130 |             |            |      |     |      |                       |      
     company1 | 07-09-2023 |   120 |             |            |      |     |      |                       |      
     company1 | 07-10-2023 |   130 |             |            |      |     |      |                       |     0
     company2 | 07-01-2023 |    50 |          50 |       1400 | 2000 |  50 | 4950 | 1237.5000000000000000 |     4
     company2 | 07-02-2023 |  2000 |             |            |      |     |      |                       |      
     company2 | 07-03-2023 |  1500 |             |            |      |     |      |                       |      
     company2 | 07-04-2023 |  1400 |             |            |      |     |      |                       |      
     company2 | 07-05-2023 |  1500 |             |            |      |     |      |                       |     0
     company2 | 07-06-2023 |    60 |          60 |       1200 | 1300 |  60 | 3660 |  915.0000000000000000 |     4
     company2 | 07-07-2023 |  1100 |             |            |      |     |      |                       |      
     company2 | 07-08-2023 |  1300 |             |            |      |     |      |                       |      
     company2 | 07-09-2023 |  1200 |             |            |      |     |      |                       |      
     company2 | 07-10-2023 |  1300 |             |            |      |     |      |                       |     0
    (20 rows)
    
    Attached is the fix against v6 patch. I will include this in upcoming v7 patch.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  45. Re: Row pattern recognition

    Jacob Champion <jchampion@timescale.com> — 2023-09-12T22:09:29Z

    On Mon, Sep 11, 2023 at 11:18 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    > What I am not sure about is, you and Vik mentioned that the
    > traditional NFA is superior that POSIX NFA in terms of performance.
    > But how "lexicographic ordering" is related to performance?
    
    I think they're only tangentially related. POSIX NFAs have to fully
    backtrack even after the first match is found, so that's where the
    performance difference comes in. (We would be introducing new ways to
    catastrophically backtrack if we used that approach.) But since you
    don't visit every possible path through the graph with a traditional
    NFA, it makes sense to define an order in which you visit the nodes,
    so that you can reason about which string is actually going to be
    matched in the end.
    
    > BTW, attched is the v6 patch. The differences from v5 include:
    >
    > - Now aggregates can be used with RPR. Below is an example from the
    >   regression test cases, which is added by v6 patch.
    
    Great, thank you!
    
    --Jacob
    
    
    
    
  46. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-13T05:14:07Z

    > <quote from 19075-5:2023>
    
    I was looking for this but I only found ISO/IEC 19075-5:2021.
    https://www.iso.org/standard/78936.html
    
    Maybe 19075-5:2021 is the latest one?
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  47. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2023-09-13T11:28:45Z

    On 9/13/23 07:14, Tatsuo Ishii wrote:
    >> <quote from 19075-5:2023>
    > 
    > I was looking for this but I only found ISO/IEC 19075-5:2021.
    > https://www.iso.org/standard/78936.html
    > 
    > Maybe 19075-5:2021 is the latest one?
    
    Yes, probably.  Sorry.
    -- 
    Vik Fearing
    
    
    
    
    
  48. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-13T12:35:53Z

    > On 9/13/23 07:14, Tatsuo Ishii wrote:
    >>> <quote from 19075-5:2023>
    >> I was looking for this but I only found ISO/IEC 19075-5:2021.
    >> https://www.iso.org/standard/78936.html
    >> Maybe 19075-5:2021 is the latest one?
    > 
    > Yes, probably.  Sorry.
    
    No problem. Thanks for confirmation.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  49. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-22T05:16:40Z

    > Attached is the fix against v6 patch. I will include this in upcoming v7 patch.
    
    Attached is the v7 patch. It includes the fix mentioned above.  Also
    this time the pattern matching engine is enhanced: previously it
    recursed to row direction, which means if the number of rows in a
    frame is large, it could exceed the limit of stack depth.  The new
    version recurses over matched pattern variables in a row, which is at
    most 26 which should be small enough.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  50. Re: Row pattern recognition

    Erik Rijkers <er@xs4all.nl> — 2023-09-22T08:12:38Z

    Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
    >> Attached is the fix against v6 patch. I will include this in upcoming v7 patch.
    > 
    > Attached is the v7 patch. It includes the fix mentioned above.  Also
    
    Hi,
    
    In my hands, make check fails on the rpr test; see attached .diff file.
    In these two statements:
    -- using NEXT
    -- using AFTER MATCH SKIP TO NEXT ROW
       result of first_value(price) and next_value(price) are empty.
    
    
    Erik Rijkers
    
    
    > this time the pattern matching engine is enhanced: previously it
    > recursed to row direction, which means if the number of rows in a
    > frame is large, it could exceed the limit of stack depth.  The new
    > version recurses over matched pattern variables in a row, which is at
    > most 26 which should be small enough.
    > 
    > Best reagards,
    > --
    > Tatsuo Ishii
    > SRA OSS LLC
    > English: http://www.sraoss.co.jp/index_en/
    > Japanese:http://www.sraoss.co.jp
  51. Re: Row pattern recognition

    Erik Rijkers <er@xs4all.nl> — 2023-09-22T08:23:11Z

    Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
    >> Attached is the fix against v6 patch. I will include this in upcoming v7 patch.
    > 
    > Attached is the v7 patch. It includes the fix mentioned above.  Also
    (Champion's address bounced; removed)
    
    Hi,
    
    In my hands, make check fails on the rpr test; see attached .diff file.
    In these two statements:
    -- using NEXT
    -- using AFTER MATCH SKIP TO NEXT ROW
       result of first_value(price) and next_value(price) are empty.
    
    Erik Rijkers
    
    
    > this time the pattern matching engine is enhanced: previously it
    > recursed to row direction, which means if the number of rows in a
    > frame is large, it could exceed the limit of stack depth.  The new
    > version recurses over matched pattern variables in a row, which is at
    > most 26 which should be small enough.
    > 
    > Best reagards,
    > --
    > Tatsuo Ishii
    > SRA OSS LLC
    > English: http://www.sraoss.co.jp/index_en/
    > Japanese:http://www.sraoss.co.jp
    
    
    
    
  52. Re: Row pattern recognition

    Erik Rijkers <er@xs4all.nl> — 2023-09-22T08:26:49Z

    Op 9/22/23 om 10:23 schreef Erik Rijkers:
    > Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
    >>> Attached is the fix against v6 patch. I will include this in upcoming 
    >>> v7 patch.
    >>
    >> Attached is the v7 patch. It includes the fix mentioned above.  Also
    > (Champion's address bounced; removed)
    > 
    
    Sorry, I forgot to re-attach the regression.diffs with resend...
    
    Erik
    
    > Hi,
    > 
    > In my hands, make check fails on the rpr test; see attached .diff file.
    > In these two statements:
    > -- using NEXT
    > -- using AFTER MATCH SKIP TO NEXT ROW
    >    result of first_value(price) and next_value(price) are empty.
    > 
    > Erik Rijkers
    > 
    > 
    >> this time the pattern matching engine is enhanced: previously it
    >> recursed to row direction, which means if the number of rows in a
    >> frame is large, it could exceed the limit of stack depth.  The new
    >> version recurses over matched pattern variables in a row, which is at
    >> most 26 which should be small enough.
    >>
    >> Best reagards,
    >> -- 
    >> Tatsuo Ishii
    >> SRA OSS LLC
    >> English: http://www.sraoss.co.jp/index_en/
    >> Japanese:http://www.sraoss.co.jp
    > 
    > 
  53. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-22T10:12:50Z

    > Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
    >>> Attached is the fix against v6 patch. I will include this in upcoming
    >>> v7 patch.
    >> Attached is the v7 patch. It includes the fix mentioned above.  Also
    > (Champion's address bounced; removed)
    
    On my side his adress bounced too:-<
    
    > Hi,
    > 
    > In my hands, make check fails on the rpr test; see attached .diff
    > file.
    > In these two statements:
    > -- using NEXT
    > -- using AFTER MATCH SKIP TO NEXT ROW
    >   result of first_value(price) and next_value(price) are empty.
    
    Strange. I have checked out fresh master branch and applied the v7
    patches, then ran make check. All tests including the rpr test
    passed. This is Ubuntu 20.04.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  54. Re: Row pattern recognition

    Erik Rijkers <er@xs4all.nl> — 2023-09-22T11:28:11Z

    Op 9/22/23 om 12:12 schreef Tatsuo Ishii:
    >> Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
    >>>> Attached is the fix against v6 patch. I will include this in upcoming
    >>>> v7 patch.
    >>> Attached is the v7 patch. It includes the fix mentioned above.  Also
    >> (Champion's address bounced; removed)
    > 
    > On my side his adress bounced too:-<
    > 
    >> Hi,
    >>
    >> In my hands, make check fails on the rpr test; see attached .diff
    >> file.
    >> In these two statements:
    >> -- using NEXT
    >> -- using AFTER MATCH SKIP TO NEXT ROW
    >>    result of first_value(price) and next_value(price) are empty.
    > 
    > Strange. I have checked out fresh master branch and applied the v7
    > patches, then ran make check. All tests including the rpr test
    > passed. This is Ubuntu 20.04.
    
    The curious thing is that the server otherwise builds ok, and if I 
    explicitly run on that server 'CREATE TEMP TABLE stock' + the 20 INSERTS 
      (just to make sure to have known data), those two statements now both 
    return the correct result.
    
    So maybe the testing/timing is wonky (not necessarily the server).
    
    Erik
    
    > 
    > Best reagards,
    > --
    > Tatsuo Ishii
    > SRA OSS LLC
    > English: http://www.sraoss.co.jp/index_en/
    > Japanese:http://www.sraoss.co.jp
    
    
    
    
  55. Re: Row pattern recognition

    Jacob Champion <champion.p@gmail.com> — 2023-09-22T14:48:22Z

    On Fri, Sep 22, 2023, 3:13 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    
    > > Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
    > >>> Attached is the fix against v6 patch. I will include this in upcoming
    > >>> v7 patch.
    > >> Attached is the v7 patch. It includes the fix mentioned above.  Also
    > > (Champion's address bounced; removed)
    >
    > On my side his adress bounced too:-<
    >
    
    Yep. I'm still here, just lurking for now. It'll take a little time for me
    to get back to this thread, as my schedule has changed significantly. :D
    
    Thanks,
    --Jacob
    
    >
    
  56. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-25T05:26:30Z

    > On Fri, Sep 22, 2023, 3:13 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    > 
    >> > Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
    >> >>> Attached is the fix against v6 patch. I will include this in upcoming
    >> >>> v7 patch.
    >> >> Attached is the v7 patch. It includes the fix mentioned above.  Also
    >> > (Champion's address bounced; removed)
    >>
    >> On my side his adress bounced too:-<
    >>
    > 
    > Yep. I'm still here, just lurking for now. It'll take a little time for me
    > to get back to this thread, as my schedule has changed significantly. :D
    
    Hope you get back soon...
    
    By the way, I was thinking about eliminating recusrive calls in
    pattern matching. Attached is the first cut of the implementation. In
    the attached v8 patch:
    
    - No recursive calls anymore. search_str_set_recurse was removed.
    
    - Instead it generates all possible pattern variable name initial
      strings before pattern matching. Suppose we have "ab" (row 0) and
      "ac" (row 1). "a" and "b" represents the pattern variable names
      which are evaluated to true.  In this case it will generate "aa",
      "ac", "ba" and "bc" and they are examined by do_pattern_match one by
      one, which performs pattern matching.
    
    - To implement this, an infrastructure string_set* are created. They
      take care of set of StringInfo.
    
    I found that the previous implementation did not search all possible
    cases. I believe the bug is fixed in v8.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  57. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-10-04T06:03:28Z

    > By the way, I was thinking about eliminating recusrive calls in
    > pattern matching. Attached is the first cut of the implementation. In
    > the attached v8 patch:
    > 
    > - No recursive calls anymore. search_str_set_recurse was removed.
    > 
    > - Instead it generates all possible pattern variable name initial
    >   strings before pattern matching. Suppose we have "ab" (row 0) and
    >   "ac" (row 1). "a" and "b" represents the pattern variable names
    >   which are evaluated to true.  In this case it will generate "aa",
    >   "ac", "ba" and "bc" and they are examined by do_pattern_match one by
    >   one, which performs pattern matching.
    > 
    > - To implement this, an infrastructure string_set* are created. They
    >   take care of set of StringInfo.
    > 
    > I found that the previous implementation did not search all possible
    > cases. I believe the bug is fixed in v8.
    
    The v8 patch does not apply anymore due to commit d060e921ea "Remove obsolete executor cleanup code".
    So I rebased and created v9 patch. The differences from the v8 include:
    
    - Fix bug with get_slots. It did not correctly detect the end of full frame.
    - Add test case using "ROWS BETWEEN CURRENT ROW AND offset FOLLOWING".
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  58. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-10-22T02:39:20Z

    Attached is the v10 patch. This version enhances the performance of
    pattern matching.  Previously it generated all possible pattern string
    candidates. This resulted in unnecessarily large number of
    candidates. For example if you have 2 pattern variables and the target
    frame includes 100 rows, the number of candidates can reach to 2^100
    in the worst case. To avoid this, I do a pruning in the v10
    patch. Suppose you have:
    
    PATTERN (A B+ C+)
    
    Candidates like "BAC" "CAB" cannot survive because they never satisfy
    the search pattern. To judge this, I assign sequence numbers (0, 1, 2)
    to (A B C).  If the pattern generator tries to generate BA, this is
    not allowed because the sequence number for B is 1 and for A is 0, and
    0 < 1: B cannot be followed by A. Note that this technique can be
    applied when the quantifiers are "+" or "*". Maybe other quantifiers
    such as '?'  or '{n, m}' can be applied too but I don't confirm yet
    because I have not implemented them yet.
    
    Besides this improvement, I fixed a bug in the previous and older
    patches: when an expression in DEFINE uses text operators, it errors
    out:
    
    ERROR:  could not determine which collation to use for string comparison
    HINT:  Use the COLLATE clause to set the collation explicitly.
    
    This was fixed by adding assign_expr_collations() in
    transformDefineClause().
    
    Also I have updated documentation "3.5. Window Functions"
    
    - It still mentioned about rpr(). It's not applied anymore.
    - Enhance the description about DEFINE and PATTERN.
    - Mention that quantifier '*' is supported.
    
    Finally I have added more test cases to the regression test.
    - same pattern variable appears twice
    - case for quantifier '*'
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  59. Re: Row pattern recognition

    Jacob Champion <champion.p@gmail.com> — 2023-10-24T18:51:19Z

    On Sat, Oct 21, 2023 at 7:39 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    > Attached is the v10 patch. This version enhances the performance of
    > pattern matching.
    
    Nice! I've attached a couple of more stressful tests (window
    partitions of 1000 rows each). Beware that the second one runs my
    desktop out of memory fairly quickly with the v10 implementation.
    
    I was able to carve out some time this week to implement a very basic
    recursive NFA, which handles both the + and * qualifiers (attached).
    It's not production quality -- a frame on the call stack for every row
    isn't going to work -- but with only those two features, it's pretty
    tiny, and it's able to run the new stress tests with no issue. If I
    continue to have time, I hope to keep updating this parallel
    implementation as you add features to the StringSet implementation,
    and we can see how it evolves. I expect that alternation and grouping
    will ratchet up the complexity.
    
    Thanks!
    --Jacob
    
  60. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-10-25T00:11:05Z

    > On Sat, Oct 21, 2023 at 7:39 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    >> Attached is the v10 patch. This version enhances the performance of
    >> pattern matching.
    > 
    > Nice! I've attached a couple of more stressful tests (window
    > partitions of 1000 rows each). Beware that the second one runs my
    > desktop out of memory fairly quickly with the v10 implementation.
    > 
    > I was able to carve out some time this week to implement a very basic
    > recursive NFA, which handles both the + and * qualifiers (attached).
    
    Great. I will look into this.
    
    > It's not production quality -- a frame on the call stack for every row
    > isn't going to work
    
    Yeah.
    
    > -- but with only those two features, it's pretty
    > tiny, and it's able to run the new stress tests with no issue. If I
    > continue to have time, I hope to keep updating this parallel
    > implementation as you add features to the StringSet implementation,
    > and we can see how it evolves. I expect that alternation and grouping
    > will ratchet up the complexity.
    
    Sounds like a plan.
    
    By the way, I tested my patch (v10) to handle more large data set and
    tried to following query with pgbench database. On my laptop it works
    with 100k rows pgbench_accounts table but with beyond the number I got
    OOM killer. I would like to enhance this in the next patch.
    
    SELECT aid, first_value(aid) OVER w,
    count(*) OVER w
    FROM pgbench_accounts
    WINDOW w AS (
    PARTITION BY bid
    ORDER BY aid
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    AFTER MATCH SKIP PAST LAST ROW
    INITIAL
    PATTERN (START UP+)
    DEFINE
    START AS TRUE,
    UP AS aid > PREV(aid)
    );
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  61. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-10-25T02:49:30Z

    > Great. I will look into this.
    
    I am impressed the simple NFA implementation.  It would be nicer if it
    could be implemented without using recursion.
    
    > By the way, I tested my patch (v10) to handle more large data set and
    > tried to following query with pgbench database. On my laptop it works
    > with 100k rows pgbench_accounts table but with beyond the number I got
           ~~~ I meant 10k.
    
    > OOM killer. I would like to enhance this in the next patch.
    > 
    > SELECT aid, first_value(aid) OVER w,
    > count(*) OVER w
    > FROM pgbench_accounts
    > WINDOW w AS (
    > PARTITION BY bid
    > ORDER BY aid
    > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > AFTER MATCH SKIP PAST LAST ROW
    > INITIAL
    > PATTERN (START UP+)
    > DEFINE
    > START AS TRUE,
    > UP AS aid > PREV(aid)
    > );
    
    I ran this against your patch. It failed around > 60k rows.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  62. Re: Row pattern recognition

    Jacob Champion <champion.p@gmail.com> — 2023-10-30T19:49:18Z

    On Tue, Oct 24, 2023 at 7:49 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    > I am impressed the simple NFA implementation.
    
    Thanks!
    
    > It would be nicer if it
    > could be implemented without using recursion.
    
    Yeah. If for some reason we end up going with a bespoke
    implementation, I assume we'd just convert the algorithm to an
    iterative one and optimize it heavily. But I didn't want to do that
    too early, since it'd probably make it harder to add new features...
    and anyway my goal is still to try to reuse src/backend/regex
    eventually.
    
    > > SELECT aid, first_value(aid) OVER w,
    > > count(*) OVER w
    > > FROM pgbench_accounts
    > > WINDOW w AS (
    > > PARTITION BY bid
    > > ORDER BY aid
    > > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > > AFTER MATCH SKIP PAST LAST ROW
    > > INITIAL
    > > PATTERN (START UP+)
    > > DEFINE
    > > START AS TRUE,
    > > UP AS aid > PREV(aid)
    > > );
    >
    > I ran this against your patch. It failed around > 60k rows.
    
    Nice, that's actually more frames than I expected. Looks like I have
    similar results here with my second test query (segfault at ~58k
    rows).
    
    Thanks,
    --Jacob
    
    
    
    
  63. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-11-08T07:37:05Z

    >> It would be nicer if it
    >> could be implemented without using recursion.
    > 
    > Yeah. If for some reason we end up going with a bespoke
    > implementation, I assume we'd just convert the algorithm to an
    > iterative one and optimize it heavily. But I didn't want to do that
    > too early, since it'd probably make it harder to add new features...
    > and anyway my goal is still to try to reuse src/backend/regex
    > eventually.
    
    Ok.
    
    Attached is the v11 patch. Below are the summary of the changes from
    previous version.
    
    - rebase.
    
    - Reduce memory allocation in pattern matching (search_str_set()). But
      still Champion's second stress test gives OOM killer.
        
      - While keeping an old set to next round, move the StringInfo to
        new_str_set, rather than copying from old_str_set. This allows to
        run pgbench.sql against up to 60k rows on my laptop (previously
        20k).
        
      - Use enlargeStringInfo to set the buffer size, rather than
        incrementally enlarge the buffer. This does not seem to give big
        enhancement but it should theoretically an enhancement.
    
    - Fix "variable not found in subplan target list" error if WITH is
      used.
        
      - To fix this apply pullup_replace_vars() against DEFINE clause in
        planning phase (perform_pullup_replace_vars()).  Also add
        regression test cases for WITH that caused the error in the
        previous version.
    
    - Fix the case when no greedy quantifiers ('+' or '*') are included in
      PATTERN.
        
      - Previously update_reduced_frame() did not consider the case and
        produced wrong results. Add another code path which is dedicated
        to none greedy PATTERN (at this point, it means there's no
        quantifier case). Also add a test case for this.
    
    - Remove unnecessary check in transformPatternClause().
    
      - Previously it checked if all pattern variables are defined in
        DEFINE clause. But currently RPR allows to "auto define" such
        variables as "varname AS TRUE". So the check was not necessary.
    
    - FYI here is the list to explain what was changed in each patch file.
    
    0001-Row-pattern-recognition-patch-for-raw-parser.patch
    - same
    
    0002-Row-pattern-recognition-patch-parse-analysis.patch
    - Add markTargetListOrigins() to transformFrameOffset().
    - Change transformPatternClause().
    
    0003-Row-pattern-recognition-patch-planner.patch
    - Fix perform_pullup_replace_vars()
    
    0004-Row-pattern-recognition-patch-executor.patch
    - Fix update_reduced_frame()
    - Fix search_str_set()
    
    0005-Row-pattern-recognition-patch-docs.patch
    - same
    
    0006-Row-pattern-recognition-patch-tests.patch
    - Add test case for non-greedy and WITH cases
    
    0007-Allow-to-print-raw-parse-tree.patch
    - same
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  64. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-12-08T01:16:13Z

    Sorry for posting v12 patch again. It seems the previous post of v12
    patch email lost mail threading information and was not recognized as
    a part of the thread by CF application and CFbot.
    https://www.postgresql.org/message-id/20231204.204048.1998548830490453126.t-ishii%40sranhm.sra.co.jp
    
    Attached is the v12 patch. Below are the summary of the changes from
    previous version.
    
    - Rebase. CFbot says v11 patch needs rebase since Nov 30, 2023.
     
    - Apply preprocess_expression() to DEFINE clause in the planning
      phase.  This is necessary to simply const expressions like:
    
        DEFINE A price < (99 + 1)
        to:
        DEFINE A price < 100
    
    - Re-allow to use WinSetMarkPosition() in eval_windowaggregates().
    
    - FYI here is the list to explain what were changed in each patch file.
    
    0001-Row-pattern-recognition-patch-for-raw-parser.patch
    - Fix conflict.
    
    0002-Row-pattern-recognition-patch-parse-analysis.patch
    - Same as before.
    
    0003-Row-pattern-recognition-patch-planner.patch
    - Call preprocess_expression() for DEFINE clause in subquery_planner().
    
    0004-Row-pattern-recognition-patch-executor.patch
    - Re-allow to use WinSetMarkPosition() in eval_windowaggregates().
    
    0005-Row-pattern-recognition-patch-docs.patch
    - Same as before.
    
    0006-Row-pattern-recognition-patch-tests.patch
    - Same as before.
    
    0007-Allow-to-print-raw-parse-tree.patch
    - Same as before.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  65. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-12-08T22:22:58Z

    > On 04.12.23 12:40, Tatsuo Ishii wrote:
    >> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
    >> index d631ac89a9..5a77fca17f 100644
    >> --- a/src/backend/parser/gram.y
    >> +++ b/src/backend/parser/gram.y
    >> @@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char
    >> *relname, List *aliases, Node *query);
    >>   	DefElem	   *defelt;
    >>   	SortBy	   *sortby;
    >>   	WindowDef  *windef;
    >> +	RPCommonSyntax	*rpcom;
    >> +	RPSubsetItem	*rpsubset;
    >>   	JoinExpr   *jexpr;
    >>   	IndexElem  *ielem;
    >>   	StatsElem  *selem;
    >> @@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char
    >> *relname, List *aliases, Node *query);
    >>   	MergeWhenClause *mergewhen;
    >>   	struct KeyActions *keyactions;
    >>   	struct KeyAction *keyaction;
    >> +	RPSkipTo	skipto;
    >>   }
    >>     %type <node>	stmt toplevel_stmt schema_stmt routine_body_stmt
    > 
    > It is usually not the style to add an entry for every node type to the
    > %union.  Otherwise, we'd have hundreds of entries in there.
    
    Ok, I have removed the node types and used existing node types.  Also
    I have moved RPR related %types to same place to make it easier to know
    what are added by RPR.
    
    >> @@ -866,6 +878,7 @@ static Node *makeRecursiveViewSelect(char
    >> *relname, List *aliases, Node *query);
    >>   %nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
    >>   %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE
    >>   %ROLLUP
    >>   			SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
    >> +%nonassoc	MEASURES AFTER INITIAL SEEK PATTERN_P
    >>   %left Op OPERATOR /* multi-character ops and user-defined operators */
    >>   %left		'+' '-'
    >>   %left		'*' '/' '%'
    > 
    > It was recently discussed that these %nonassoc should ideally all have
    > the same precedence.  Did you consider that here?
    
    No, I didn't realize it. Thanks for pointing it out. I have removed
    %nonassoc so that MEASURES etc. have the same precedence as IDENT etc.
    
    Attached is the new diff of gram.y against master branch.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
  66. Re: Row pattern recognition

    NINGWEI CHEN <chen@sraoss.co.jp> — 2024-01-22T05:51:49Z

    On Sat, 09 Dec 2023 07:22:58 +0900 (JST)
    Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    
    > > On 04.12.23 12:40, Tatsuo Ishii wrote:
    > >> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
    > >> index d631ac89a9..5a77fca17f 100644
    > >> --- a/src/backend/parser/gram.y
    > >> +++ b/src/backend/parser/gram.y
    > >> @@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char
    > >> *relname, List *aliases, Node *query);
    > >>   	DefElem	   *defelt;
    > >>   	SortBy	   *sortby;
    > >>   	WindowDef  *windef;
    > >> +	RPCommonSyntax	*rpcom;
    > >> +	RPSubsetItem	*rpsubset;
    > >>   	JoinExpr   *jexpr;
    > >>   	IndexElem  *ielem;
    > >>   	StatsElem  *selem;
    > >> @@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char
    > >> *relname, List *aliases, Node *query);
    > >>   	MergeWhenClause *mergewhen;
    > >>   	struct KeyActions *keyactions;
    > >>   	struct KeyAction *keyaction;
    > >> +	RPSkipTo	skipto;
    > >>   }
    > >>     %type <node>	stmt toplevel_stmt schema_stmt routine_body_stmt
    > > 
    > > It is usually not the style to add an entry for every node type to the
    > > %union.  Otherwise, we'd have hundreds of entries in there.
    > 
    > Ok, I have removed the node types and used existing node types.  Also
    > I have moved RPR related %types to same place to make it easier to know
    > what are added by RPR.
    > 
    > >> @@ -866,6 +878,7 @@ static Node *makeRecursiveViewSelect(char
    > >> *relname, List *aliases, Node *query);
    > >>   %nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
    > >>   %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE
    > >>   %ROLLUP
    > >>   			SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
    > >> +%nonassoc	MEASURES AFTER INITIAL SEEK PATTERN_P
    > >>   %left Op OPERATOR /* multi-character ops and user-defined operators */
    > >>   %left		'+' '-'
    > >>   %left		'*' '/' '%'
    > > 
    > > It was recently discussed that these %nonassoc should ideally all have
    > > the same precedence.  Did you consider that here?
    > 
    > No, I didn't realize it. Thanks for pointing it out. I have removed
    > %nonassoc so that MEASURES etc. have the same precedence as IDENT etc.
    > 
    > Attached is the new diff of gram.y against master branch.
    
    Thank you very much for providing the patch for the RPR implementation.
    
    After applying the v12-patches, I noticed an issue that
    the rpr related parts in window clauses were not displayed in the
    view definitions (the definition column of pg_views).
    
    To address this, I have taken the liberty of adding an additional patch
    that modifies the relevant rewriter source code.
    I have attached the rewriter patch for your review and would greatly appreciate your feedback.
    
    Thank you for your time and consideration.
    
    -- 
    SRA OSS LLC
    Ningwei Chen <chen@sraoss.co.jp>
    TEL: 03-5979-2701 FAX: 03-5979-2702
    
  67. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-01-22T06:22:11Z

    > Thank you very much for providing the patch for the RPR implementation.
    > 
    > After applying the v12-patches, I noticed an issue that
    > the rpr related parts in window clauses were not displayed in the
    > view definitions (the definition column of pg_views).
    > 
    > To address this, I have taken the liberty of adding an additional patch
    > that modifies the relevant rewriter source code.
    > I have attached the rewriter patch for your review and would greatly appreciate your feedback.
    > 
    > Thank you for your time and consideration.
    
    Thank you so much for spotting the issue and creating the patch. I
    confirmed that your patch applies cleanly and solve the issue. I will
    include the patches into upcoming v13 patches.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  68. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-01-22T10:26:18Z

    Attached is the v13 patch. Below are the summary of the changes from
    previous version (besides rebase).
    
    0001-Row-pattern-recognition-patch-for-raw-parser.patch
    - Fix raw paser per Peter Eisentraut's review. Remove the new node
      types and use existing ones. Also remove %nonassoc so that
      MEASURES etc. have the same precedence as IDENT etc.
    
    Peter's comment:
    > It is usually not the style to add an entry for every node type to the
    > %union.  Otherwise, we'd have hundreds of entries in there.
    
    > It was recently discussed that these %nonassoc should ideally all have
    > the same precedence.  Did you consider that here?
    
    0002-Row-pattern-recognition-patch-parse-analysis.patch
    - Fix transformRPR so that SKIP variable name in the AFTER MATCH SKIP
      TO clause is tracked. This is added by Ningwei Chen.
    
    0003-Row-pattern-recognition-patch-rewriter.patch
    This is a new patch for rewriter. Contributed by Ningwei Chen.
    
    Chen's comment:
    > After applying the v12-patches, I noticed an issue that
    > the rpr related parts in window clauses were not displayed in the
    > view definitions (the definition column of pg_views).
    
    0004-Row-pattern-recognition-patch-planner.patch
    - same as before (previously it was 0003-Row-pattern-recognition-patch-planner.patch)
    
    0005-Row-pattern-recognition-patch-executor.patch
    - same as before (previously it was 0004-Row-pattern-recognition-patch-executor.patch)
    
    0006-Row-pattern-recognition-patch-docs.patch
    - Same as before. (previously it was 0005-Row-pattern-recognition-patch-docs.patch)
    
    0007-Row-pattern-recognition-patch-tests.patch
    - Same as before. (previously it was 0006-Row-pattern-recognition-patch-tests.patch)
    
    0008-Allow-to-print-raw-parse-tree.patch
    - Same as before. (previously it was 0007-Allow-to-print-raw-parse-tree.patch).
      Note that patch is not intended to be incorporated into main
      tree. This is just for debugging purpose. With this patch, raw parse
      tree is printed if debug_print_parse is enabled.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  69. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-02-29T00:19:54Z

    Attached is the v14 patch. Below are the summary of the changes from
    previous version (besides rebase).
    V14 patches are mainly for coding style fixes.
    
    0001-Row-pattern-recognition-patch-for-raw-parser.patch
    - Fold too long lines and run pgindent.
    
    0002-Row-pattern-recognition-patch-parse-analysis.patch
    - Fold too long lines and run pgindent.
    
    0003-Row-pattern-recognition-patch-rewriter.patch
    - Fold too long lines and run pgindent.
    
    0004-Row-pattern-recognition-patch-planner.patch
    - Fold too long lines and run pgindent.
    
    0005-Row-pattern-recognition-patch-executor.patch
    - Fold too long lines and run pgindent.
    
    - Surround debug lines using "ifdef RPR_DEBUG" so that logs are not
      contaminated by RPR debug logs when log_min_messages are set to
      DEBUG1 or higher.
    
    0006-Row-pattern-recognition-patch-docs.patch
    - Same as before. (previously it was 0005-Row-pattern-recognition-patch-docs.patch)
    
    0007-Row-pattern-recognition-patch-tests.patch
    - Same as before. (previously it was 0006-Row-pattern-recognition-patch-tests.patch)
    
    0008-Allow-to-print-raw-parse-tree.patch
    - Same as before.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  70. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-03-28T10:59:25Z

    Attached is the v15 patch. No changes are made except rebasing due to
    recent grammar changes.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
  71. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-04-12T07:09:08Z

    Attached is the v16 patch. No changes are made except rebasing due to
    recent grammar changes.
    
    Also I removed chen@sraoss.co.jp from the Cc: list. The email address
    is no longer valid.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  72. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-04-24T03:12:44Z

    Hi Vik and Champion,
    
    I think the current RPR patch is not quite correct in handling
    count(*).
    
    (using slightly modified version of Vik's example query)
    
    SELECT v.a, count(*) OVER w
    FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
    WINDOW w AS (
      ORDER BY v.a
      ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
      PATTERN (B+)
      DEFINE B AS a = 'B'
    )
     a | count 
    ---+-------
     A |     0
     B |     2
     B |      
     C |     0
    (4 rows)
    
    Here row 3 is skipped because the pattern B matches row 2 and 3. In
    this case I think cont(*) should return 0 rathern than NULL for row 3.
    
    What do you think?
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  73. Re: Row pattern recognition

    Jacob Champion <jacob.champion@enterprisedb.com> — 2024-04-24T17:55:29Z

    On Tue, Apr 23, 2024 at 8:13 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    > SELECT v.a, count(*) OVER w
    > FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
    > WINDOW w AS (
    >   ORDER BY v.a
    >   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >   PATTERN (B+)
    >   DEFINE B AS a = 'B'
    > )
    >  a | count
    > ---+-------
    >  A |     0
    >  B |     2
    >  B |
    >  C |     0
    > (4 rows)
    >
    > Here row 3 is skipped because the pattern B matches row 2 and 3. In
    > this case I think cont(*) should return 0 rathern than NULL for row 3.
    
    I think returning zero would match Vik's explanation upthread [1],
    yes. Unfortunately I don't have a spec handy to double-check for
    myself right now.
    
    --Jacob
    
    [1] https://www.postgresql.org/message-id/c9ebc3d0-c3d1-e8eb-4a57-0ec099cbda17%40postgresfriends.org
    
    
    
    
  74. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-04-26T06:09:32Z

    > On Tue, Apr 23, 2024 at 8:13 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
    >> SELECT v.a, count(*) OVER w
    >> FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
    >> WINDOW w AS (
    >>   ORDER BY v.a
    >>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >>   PATTERN (B+)
    >>   DEFINE B AS a = 'B'
    >> )
    >>  a | count
    >> ---+-------
    >>  A |     0
    >>  B |     2
    >>  B |
    >>  C |     0
    >> (4 rows)
    >>
    >> Here row 3 is skipped because the pattern B matches row 2 and 3. In
    >> this case I think cont(*) should return 0 rathern than NULL for row 3.
    > 
    > I think returning zero would match Vik's explanation upthread [1],
    > yes. Unfortunately I don't have a spec handy to double-check for
    > myself right now.
    
    Ok. I believe you and Vik are correct.
    I am modifying the patch in this direction.
    Attached is the regression diff after modifying the patch.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  75. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-04-28T11:28:26Z

    >> I think returning zero would match Vik's explanation upthread [1],
    >> yes. Unfortunately I don't have a spec handy to double-check for
    >> myself right now.
    > 
    > Ok. I believe you and Vik are correct.
    > I am modifying the patch in this direction.
    
    Attached are the v17 patches in the direction. Differences from v16
    include:
    
    - In 0005 executor patch, aggregation in RPR always restarts for each
      row. This is necessary to run aggregates on no matching (due to
      skipping) or empty matching (due to no pattern variables matches)
      rows to produce NULL (most aggregates) or 0 (count) properly. In v16
      I had a hack using a flag to force the aggregation results to be
      NULL in case of no match or empty match in
      finalize_windowaggregate(). v17 eliminates the dirty hack.
    
    - 0006 docs and 0007 test patches are adjusted to reflect the RPR
      output chages in 0005.
      
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  76. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-05-11T07:23:07Z

    Attached are the v18 patches. To fix conflicts due to recent commit:
    
    7d2c7f08d9 Fix query pullup issue with WindowClause runCondition
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  77. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-05-15T00:02:03Z

    Attached are the v19 patches. Changes from v18 include:
    
    0002:
    - add a check whether DEFINE clause includes subqueries. If so, error out.
    0007:
    - fix wrong test (row pattern definition variable name must not appear
      more than once)
    - remove unnessary test (undefined define variable is not allowed).
      We have already allowed the undefined variables.
    - add tests: subqueries and aggregates in DEFINE clause are not
      supported. The standard allows them but I have not implemented them
      yet.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  78. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-05-24T02:39:19Z

    Attached are the v20 patches. Just rebased.
    (The conflict was in 0001 patch.)
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  79. Re: Row pattern recognition

    Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-06-13T00:25:01Z

    I gave a talk on RPR in PGConf.dev 2024.
    https://www.pgevents.ca/events/pgconfdev2024/schedule/session/114-implementing-row-pattern-recognition/
    (Slides are available from the link).
    
    Vik Faring and Jacob Champion were one of the audiences and we had a
    small discussion after the talk. We continued the discussion off list
    on how to move forward the RPR implementation project. One of the
    ideas is, to summarize what are in the patch and what are not from the
    SQL standard specification's point of view. This should help us to
    reach the consensus regarding "minimum viable" feature set if we want
    to bring the patch in upcoming PostgreSQL v18.
    
    Here is the first cut of the document. Comments/feedback are welcome.
    
    -------------------------------------------------------------------------
    This memo describes the current status of implementation of SQL/RPR
    (Row Pattern Recognition), as of June 13, 2024 (the latest patch is v20).
    
    - RPR in FROM clause and WINDOW clause
    
    The SQL standard defines two features regarding SQL/RPR - R010 (RPR in
    FROM clause) and R020 (RPR in WINDOW clause). Only R020 is
    implemented. From now on, we discuss on R020.
    
    - Overview of R020 syntax
    
    WINDOW window_name AS (
    [ PARTITION BY ... ]
    [ ORDER BY... ]
    [ MEASURES ... ]
    ROWS BETWEEN CURRENT ROW AND ...
    [ AFTER MATCH SKIP ... ]
    [ INITIAL|SEEK ]
    PATTERN (...)
    [ SUBSET ... ]
    DEFINE ...
    )
    
    -- PARTITION BY and ORDER BY are not specific to RPR and has been
      already there in current PostgreSQL.
    
    -- What are (partially) implemented:
    
    AFTER MATCH SKIP
    INITIAL|SEEK
    PATTERN
    DEFINE
    
    -- What are not implemented at all:
    MEASURES
    SUBSET
    
    Followings are detailed status of the each clause.
    
    - AFTER MATCH SKIP
    
    -- Implemented:
    AFTER MATCH SKIP TO NEXT ROW
    AFTER MATCH SKIP PAST LAST ROW
    
    -- Not implemented:
    AFTER MATCH SKIP TO FIRST|LAST pattern_variable
    
    - INITIAL|SEEK
    
    --Implemented:
    INITIAL
    
    -- Not implemented:
    SEEK
    
    - DEFINE
    
    -- Partially implemented row pattern navigation operations are PREV and
       NEXT. FIRST and LAST are not implemented.
    
    -- The standard says PREV and NEXT accepts optional argument "offset"
       but it's not implemented.
    
    -- The standard says the row pattern navigation operations can be
       nested but it's not implemented.
    
    -- CLLASSIFIER, use of aggregate functions and subqueries in DEFINE
       clause are not implemented.
    
    - PATTERN
    
    -- Followings are implemented:
    +: 1 or more rows
    *: 0 or more rows
    
    -- Followings are not implemented:
    ?: 0 or 1 row
    A | B: OR condition
    (A B): grouping
    {n}: n rows
    {n,}: n or more rows
    {n,m}: greater or equal to n rows and less than or equal to m rows
    {,m}: more than 0 and less than or equal to m rows
    -------------------------------------------------------------------------
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS LLC
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  80. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-08-26T04:39:47Z

    Attached are the v21 patches. Just rebased.
    (The conflict was in 0001 patch.)
    
    The 0008 patch is just for debugging purpose. You can ignore it.
    This hasn't been changed, but I would like to notice just in case.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  81. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-09-19T04:59:47Z

    Attached are the v22 patches. Just rebased.  The conflict was in 0001
    patch due to commit 89f908a6d0 "Add temporal FOREIGN KEY contraints".
    
    The 0008 patch is just for debugging purpose. You can ignore it.
    This hasn't been changed, but I would like to notice just in case.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  82. Re: Row pattern recognition

    Jacob Champion <jacob.champion@enterprisedb.com> — 2024-09-27T22:27:07Z

    On Wed, Sep 18, 2024 at 10:00 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
    >
    > Attached are the v22 patches. Just rebased.
    
    Thanks!
    
    With some bigger partitions, I hit an `ERROR:  wrong pos: 1024`. A
    test that reproduces it is attached.
    
    While playing with the feature, I've been trying to identify runs of
    matched rows by eye. But it's pretty difficult -- the best I can do is
    manually count rows using a `COUNT(*) OVER ...`. So I'd like to
    suggest that MEASURES be part of the eventual v1 feature, if there's
    no other way to determine whether a row was skipped by a previous
    match. (That was less obvious to me before the fix in v17.)
    
    --
    
    I've been working on an implementation [1] of SQL/RPR's "parenthesized
    language" and preferment order. (These are defined in SQL/Foundation
    2023, section 9.41.) The tool gives you a way to figure out, for a
    given pattern, what matches are supposed to be attempted and in what
    order:
    
        $ ./src/test/modules/rpr/rpr_prefer "a b? a"
        ( ( a ( b ) ) a )
        ( ( a ( ) ) a )
    
    Many simple patterns result in an infinite set of possible matches. So
    if you use an unbounded quantifiers, you have to also use --max-rows
    to limit the size of the hypothetical window frame:
    
        $ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $"
        ( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ )
        ( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ )
        ( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ )
        ( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ )
        ( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ )
        ( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ )
        ( ( ^ ( ) ) $ )
    
    I've found this useful to check my personal understanding of the spec
    and the match behavior, but it could also potentially be used to
    generate test cases, or to help users debug their own patterns. For
    example, a pattern that has a bunch of duplicate sequences in its PL
    is probably not very well optimized:
    
        $ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+"
        ( ( a a a ) ( a ) )
        ( ( a a ) ( a a ) )
        ( ( a a ) ( a ) )
        ( ( a ) ( a a a ) )
        ( ( a ) ( a a ) )
        ( ( a ) ( a ) )
    
    And patterns with catastrophic backtracking behavior tend to show a
    "sawtooth" pattern in the output, with a huge number of potential
    matches being generated relative to the number of rows in the frame.
    
    My implementation is really messy -- it leaks memory like a sieve, and
    I cannibalized the parser from ECPG, which just ended up as an
    exercise in teaching myself flex/bison. But if there's interest in
    having this kind of tool in the tree, I can work on making it
    reviewable. Either way, I should be able to use it to double-check
    more complicated test cases.
    
    A while back [2], you were wondering whether our Bison implementation
    would be able to parse the PATTERN grammar directly. I think this tool
    proves that the answer is "yes", but PERMUTE in particular causes a
    shift/reduce conflict. To fix it, I applied the same precedence
    workaround that we use for CUBE and ROLLUP.
    
    Thanks again!
    --Jacob
    
    [1] https://github.com/jchampio/postgres/tree/dev/rpr
    [2] https://www.postgresql.org/message-id/20230721.151648.412762379013769790.t-ishii%40sranhm.sra.co.jp
    
  83. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-09-28T10:43:59Z

    > With some bigger partitions, I hit an `ERROR:  wrong pos: 1024`. A
    > test that reproduces it is attached.
    
    Thanks for the report. Attached is a patch on top of v22 patches to
    fix the bug. We keep info in an array
    (WindowAggState.reduced_frame_map) to track the rpr pattern match
    result status for each row in a frame. If pattern match succeeds, the
    first row in the reduced frame has status RF_FRAME_HEAD and rest of
    rows have RF_SKIPPED state. A row with pattern match failure state has
    RF_UNMATCHED state. Any row which is never tested has state
    RF_NOT_DETERMINED. At begining the map is initialized with 1024
    entries with all RF_NOT_DETERMINED state. Eventually they are replaced
    with other than RF_NOT_DETERMINED state. In the error case rpr engine
    tries to find 1024 th row's state in the map and failed because the
    row's state has not been tested yet. I think we should treat it as
    RF_NOT_DETERMINED rather than an error. Attached patch does it.
    
    > While playing with the feature, I've been trying to identify runs of
    > matched rows by eye. But it's pretty difficult -- the best I can do is
    > manually count rows using a `COUNT(*) OVER ...`. So I'd like to
    > suggest that MEASURES be part of the eventual v1 feature, if there's
    > no other way to determine whether a row was skipped by a previous
    > match. (That was less obvious to me before the fix in v17.)
    
    I think implementing MEASURES is challenging. Especially we need to
    find how our parser accepts "colname OVER
    window_definition". Currently PostgreSQL's parser only accepts "func()
    OVER window_definition" Even it is technically possible, I think the
    v1 patch size will become much larger than now due to this.
    
    How about inventing new window function that returns row state instead?
    
    - match found (yes/no)
    - skipped due to AFTER MATCH SKIP PAST LAST ROW (no match)
    
    For the rest of the mail I need more time to understand. I will reply
    back after studying it. For now, I just want to thank you for the
    valuable information!
    
    > --
    > 
    > I've been working on an implementation [1] of SQL/RPR's "parenthesized
    > language" and preferment order. (These are defined in SQL/Foundation
    > 2023, section 9.41.) The tool gives you a way to figure out, for a
    > given pattern, what matches are supposed to be attempted and in what
    > order:
    > 
    >     $ ./src/test/modules/rpr/rpr_prefer "a b? a"
    >     ( ( a ( b ) ) a )
    >     ( ( a ( ) ) a )
    > 
    > Many simple patterns result in an infinite set of possible matches. So
    > if you use an unbounded quantifiers, you have to also use --max-rows
    > to limit the size of the hypothetical window frame:
    > 
    >     $ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $"
    >     ( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ )
    >     ( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ )
    >     ( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ )
    >     ( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ )
    >     ( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ )
    >     ( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ )
    >     ( ( ^ ( ) ) $ )
    > 
    > I've found this useful to check my personal understanding of the spec
    > and the match behavior, but it could also potentially be used to
    > generate test cases, or to help users debug their own patterns. For
    > example, a pattern that has a bunch of duplicate sequences in its PL
    > is probably not very well optimized:
    > 
    >     $ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+"
    >     ( ( a a a ) ( a ) )
    >     ( ( a a ) ( a a ) )
    >     ( ( a a ) ( a ) )
    >     ( ( a ) ( a a a ) )
    >     ( ( a ) ( a a ) )
    >     ( ( a ) ( a ) )
    > 
    > And patterns with catastrophic backtracking behavior tend to show a
    > "sawtooth" pattern in the output, with a huge number of potential
    > matches being generated relative to the number of rows in the frame.
    > 
    > My implementation is really messy -- it leaks memory like a sieve, and
    > I cannibalized the parser from ECPG, which just ended up as an
    > exercise in teaching myself flex/bison. But if there's interest in
    > having this kind of tool in the tree, I can work on making it
    > reviewable. Either way, I should be able to use it to double-check
    > more complicated test cases.
    > 
    > A while back [2], you were wondering whether our Bison implementation
    > would be able to parse the PATTERN grammar directly. I think this tool
    > proves that the answer is "yes", but PERMUTE in particular causes a
    > shift/reduce conflict. To fix it, I applied the same precedence
    > workaround that we use for CUBE and ROLLUP.
    > 
    > Thanks again!
    > --Jacob
    > 
    > [1] https://github.com/jchampio/postgres/tree/dev/rpr
    > [2] https://www.postgresql.org/message-id/20230721.151648.412762379013769790.t-ishii%40sranhm.sra.co.jp
    
  84. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-09-30T00:07:51Z

    >> While playing with the feature, I've been trying to identify runs of
    >> matched rows by eye. But it's pretty difficult -- the best I can do is
    >> manually count rows using a `COUNT(*) OVER ...`. So I'd like to
    >> suggest that MEASURES be part of the eventual v1 feature, if there's
    >> no other way to determine whether a row was skipped by a previous
    >> match. (That was less obvious to me before the fix in v17.)
    > 
    > I think implementing MEASURES is challenging. Especially we need to
    > find how our parser accepts "colname OVER
    > window_definition". Currently PostgreSQL's parser only accepts "func()
    > OVER window_definition" Even it is technically possible, I think the
    > v1 patch size will become much larger than now due to this.
    > 
    > How about inventing new window function that returns row state instead?
    > 
    > - match found (yes/no)
    > - skipped due to AFTER MATCH SKIP PAST LAST ROW (no match)
    
    Please disregard my proposal. Even if we make such a function, it
    would always return NULL for unmatched rows or skipped rows, and I
    think the function does not solve your problem.
    
    However, I wonder if supporting MEASURES solves the problem either
    because any columns defined by MEASURES will return NULL except the
    first row in a reduced frame. Can you please show an example how to
    identify runs of matched rows using MEASURES?
    
    > For the rest of the mail I need more time to understand. I will reply
    > back after studying it. For now, I just want to thank you for the
    > valuable information!
    > 
    >> --
    >> 
    >> I've been working on an implementation [1] of SQL/RPR's "parenthesized
    >> language" and preferment order. (These are defined in SQL/Foundation
    >> 2023, section 9.41.) The tool gives you a way to figure out, for a
    >> given pattern, what matches are supposed to be attempted and in what
    >> order:
    >> 
    >>     $ ./src/test/modules/rpr/rpr_prefer "a b? a"
    >>     ( ( a ( b ) ) a )
    >>     ( ( a ( ) ) a )
    >> 
    >> Many simple patterns result in an infinite set of possible matches. So
    >> if you use an unbounded quantifiers, you have to also use --max-rows
    >> to limit the size of the hypothetical window frame:
    >> 
    >>     $ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $"
    >>     ( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ )
    >>     ( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ )
    >>     ( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ )
    >>     ( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ )
    >>     ( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ )
    >>     ( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ )
    >>     ( ( ^ ( ) ) $ )
    
    I wonder how Oracle solves the problem (an infinite set of possible
    matches) without using "--max-rows" or something like that because in
    my understanding Oracle supports the regular expressions and PERMUTE.
    
    >> I've found this useful to check my personal understanding of the spec
    >> and the match behavior, but it could also potentially be used to
    >> generate test cases, or to help users debug their own patterns. For
    >> example, a pattern that has a bunch of duplicate sequences in its PL
    >> is probably not very well optimized:
    >> 
    >>     $ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+"
    >>     ( ( a a a ) ( a ) )
    >>     ( ( a a ) ( a a ) )
    >>     ( ( a a ) ( a ) )
    >>     ( ( a ) ( a a a ) )
    >>     ( ( a ) ( a a ) )
    >>     ( ( a ) ( a ) )
    >> 
    >> And patterns with catastrophic backtracking behavior tend to show a
    >> "sawtooth" pattern in the output, with a huge number of potential
    >> matches being generated relative to the number of rows in the frame.
    >> 
    >> My implementation is really messy -- it leaks memory like a sieve, and
    >> I cannibalized the parser from ECPG, which just ended up as an
    >> exercise in teaching myself flex/bison. But if there's interest in
    >> having this kind of tool in the tree, I can work on making it
    >> reviewable. Either way, I should be able to use it to double-check
    >> more complicated test cases.
    
    I definitely am interested in the tool!
    
    >> A while back [2], you were wondering whether our Bison implementation
    >> would be able to parse the PATTERN grammar directly. I think this tool
    >> proves that the answer is "yes", but PERMUTE in particular causes a
    >> shift/reduce conflict. To fix it, I applied the same precedence
    >> workaround that we use for CUBE and ROLLUP.
    
    That's a good news!
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  85. Re: Row pattern recognition

    Jacob Champion <jacob.champion@enterprisedb.com> — 2024-10-01T12:48:02Z

    On Sun, Sep 29, 2024 at 5:08 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
    > > I think implementing MEASURES is challenging. Especially we need to
    > > find how our parser accepts "colname OVER
    > > window_definition". Currently PostgreSQL's parser only accepts "func()
    > > OVER window_definition" Even it is technically possible, I think the
    > > v1 patch size will become much larger than now due to this.
    
    [resending, to the whole list this time]
    
    Yeah. In any case, I'm not the right person to bolt MEASURES onto the
    existing grammar... my misadventures in the PATTERN parser have
    highlighted how little I know about Bison. :D
    
    > Please disregard my proposal. Even if we make such a function, it
    > would always return NULL for unmatched rows or skipped rows, and I
    > think the function does not solve your problem.
    >
    > However, I wonder if supporting MEASURES solves the problem either
    > because any columns defined by MEASURES will return NULL except the
    > first row in a reduced frame. Can you please show an example how to
    > identify runs of matched rows using MEASURES?
    
    I think you're probably right; my suggestion can't distinguish between
    skipped (but previously matched) rows and entirely-unmatched rows. The
    test case I'd been working with returned an empty match as a fallback,
    so it wouldn't have had that problem in practice. I was hoping that
    one of the existing whole-partition window functions would allow me to
    cobble something together based on the COUNT(*) measure, but after
    searching for a while I haven't been able to come up with a solution.
    
    Maybe it's just too niche for the window-function version of this --
    after all, it only makes sense when using both INITIAL and AFTER MATCH
    SKIP PAST LAST ROW. A more general solution could identify the
    row_number of the first and last rows of the window frame, perhaps?
    But a frame isn't guaranteed to be contiguous, so maybe that doesn't
    make sense either. Ugh.
    
    > I wonder how Oracle solves the problem (an infinite set of possible
    > matches) without using "--max-rows" or something like that because in
    > my understanding Oracle supports the regular expressions and PERMUTE.
    
    I chose a confusing way to describe it, sorry. The parenthesized
    language for a pattern can be an infinite set, because A+ could match
    "( A )" or "( A A )" or "( A A A )" and so on forever. But that
    doesn't apply to our regex engine in practice; our tables have a
    finite number of rows, and I *think* the PL for a finite number of
    rows is also finite, due to the complicated rules on where empty
    matches are allowed to appear in the language. (In any case, my tool
    doesn't guard against infinite recursion...)
    
    > >> My implementation is really messy -- it leaks memory like a sieve, and
    > >> I cannibalized the parser from ECPG, which just ended up as an
    > >> exercise in teaching myself flex/bison. But if there's interest in
    > >> having this kind of tool in the tree, I can work on making it
    > >> reviewable. Either way, I should be able to use it to double-check
    > >> more complicated test cases.
    >
    > I definitely am interested in the tool!
    
    Okay, good to know! I will need to clean it up considerably, and
    figure out whether I've duplicated more code than I should have.
    
    > >> A while back [2], you were wondering whether our Bison implementation
    > >> would be able to parse the PATTERN grammar directly. I think this tool
    > >> proves that the answer is "yes", but PERMUTE in particular causes a
    > >> shift/reduce conflict. To fix it, I applied the same precedence
    > >> workaround that we use for CUBE and ROLLUP.
    >
    > That's a good news!
    
    +1
    
    Thanks,
    --Jacob
    
    
    
    
  86. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-10-22T04:53:43Z

    Hi,
    
    I wonder how "PREV(col + 1)" is different from "PREV(col) + 1".
    Currently my RPR implementation does not allow PREV(col + 1). If
    "PREV(col + 1)" is different from "PREV(col) + 1", it maybe worthwhile
    to implement "PREV(col + 1)".
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  87. Row pattern recognition

    David G. Johnston <david.g.johnston@gmail.com> — 2024-10-22T05:29:34Z

    On Monday, October 21, 2024, Tatsuo Ishii <ishii@postgresql.org> wrote:
    >
    > I wonder how "PREV(col + 1)" is different from "PREV(col) + 1".
    > Currently my RPR implementation does not allow PREV(col + 1). If
    > "PREV(col + 1)" is different from "PREV(col) + 1", it maybe worthwhile
    > to implement "PREV(col + 1)".
    >
    
    Interesting feature that I’m now just seeing.
    
    The expression PREV(column_name) produces a value output taken from the
    given named column in the preceding frame row.  It doesn’t make any sense
    to me to attempt to add the integer 1 to an identifier that is being used
    as a value input to a “function”.  It would also seem quite odd if “+ 1”
    had something to do with row selection as opposed to simply being an
    operator “+(column_name%type, integer)” expression.
    
    Maybe RPR is defining something special here I haven't yet picked up on, in
    which case just ignore this.  But if I read: “UP as price > prev(price +
    1)” in the opening example it would be quite non-intuitive to reason out
    the meaning.  “Price > prev(price) + 1” would mean my current row is at
    least one (e.g. dollar per share) more than the value of the previous
    period.
    
    David J.
    
  88. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-10-22T10:19:09Z

    > On Monday, October 21, 2024, Tatsuo Ishii <ishii@postgresql.org> wrote:
    >>
    >> I wonder how "PREV(col + 1)" is different from "PREV(col) + 1".
    >> Currently my RPR implementation does not allow PREV(col + 1). If
    >> "PREV(col + 1)" is different from "PREV(col) + 1", it maybe worthwhile
    >> to implement "PREV(col + 1)".
    >>
    > 
    > Interesting feature that I’m now just seeing.
    > 
    > The expression PREV(column_name) produces a value output taken from the
    > given named column in the preceding frame row.  It doesn’t make any sense
    > to me to attempt to add the integer 1 to an identifier that is being used
    > as a value input to a “function”.  It would also seem quite odd if “+ 1”
    > had something to do with row selection as opposed to simply being an
    > operator “+(column_name%type, integer)” expression.
    
    According to the ISO/IEC 9075-2:2016, 6.26 <row pattern navigation
    operation> (I don't have access to SQL 2023) PREV (and NEXT) is
    defined as:
    
    <row pattern navigation: physical> ::=
     <prev or next> <left paren> <value expression> [ <comma> <physical offset> ] <right paren>
    
    (Besides <row pattern navigation: physical>, there are <row pattern
    navigation: logical> and <row pattern navigation: compound> but I
    ignore them here).
    
    So PREV's first argument is a value expression (VE). VE shall contain
    at least one row pattern column reference. <set function
    specification>, <window function specification> or <row patter
    navigation operation> are not permitted.
    
    From this, I don't see any reason PREV(column_name + 1) is prohibited
    unless I miss something.
    
    I think even PREV(column_name1 + column_name2) is possible. I see
    similar example in ISO/IEC 19075-5:2021, 5.6.2 "PREV and NEXT".
    
    > Maybe RPR is defining something special here I haven't yet picked up on, in
    > which case just ignore this.  But if I read: “UP as price > prev(price +
    > 1)” in the opening example it would be quite non-intuitive to reason out
    > the meaning.  “Price > prev(price) + 1” would mean my current row is at
    > least one (e.g. dollar per share) more than the value of the previous
    > period.
    
    Acording to ISO/IEC 9075-2:2016 "4.21.2 Row pattern navigation operations",
    
      <row pattern navigation operation> evaluates a <value expression> VE
      in a row NR, which may be different than current row CR.
    
    From this I think PREV(col + 1) should be interpreted as:
    
    1. go to the previous row.
    2. evaluate "col + 1" at the current row (that was previous row).
    3. return the result.
    
    If my understanding is correct, prev(price + 1) has the same meaning
    as prev(price) + 1.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  89. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2024-10-22T13:12:43Z

    On 22/10/2024 12:19, Tatsuo Ishii wrote:
    > Acording to ISO/IEC 9075-2:2016 "4.21.2 Row pattern navigation operations",
    >
    >    <row pattern navigation operation> evaluates a <value expression> VE
    >    in a row NR, which may be different than current row CR.
    >
    >  From this I think PREV(col + 1) should be interpreted as:
    >
    > 1. go to the previous row.
    > 2. evaluate "col + 1" at the current row (that was previous row).
    > 3. return the result.
    >
    > If my understanding is correct, prev(price + 1) has the same meaning
    > as prev(price) + 1.
    
    
    
    This is how I read the specification also.
    
    -- 
    
    Vik Fearing
    
  90. Re: Row pattern recognition

    David G. Johnston <david.g.johnston@gmail.com> — 2024-10-22T14:19:41Z

    On Tue, Oct 22, 2024 at 6:12 AM Vik Fearing <vik@postgresfriends.org> wrote:
    
    >
    > On 22/10/2024 12:19, Tatsuo Ishii wrote:
    >
    > Acording to ISO/IEC 9075-2:2016 "4.21.2 Row pattern navigation operations",
    >
    >   <row pattern navigation operation> evaluates a <value expression> VE
    >   in a row NR, which may be different than current row CR.
    >
    > From this I think PREV(col + 1) should be interpreted as:
    >
    > 1. go to the previous row.
    > 2. evaluate "col + 1" at the current row (that was previous row).
    > 3. return the result.
    >
    > If my understanding is correct, prev(price + 1) has the same meaning
    > as prev(price) + 1.
    >
    >
    >
    > This is how I read the specification also.
    >
    >
    >
    That makes sense.  Definitely much nicer to only have to write PREV once if
    the expression you are evaluating involves multiple columns.  And is also
    consistent with window function "value" behavior.
    
    Thanks!
    
    David J.
    
  91. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-10-25T04:04:53Z

    > On Tue, Oct 22, 2024 at 6:12 AM Vik Fearing <vik@postgresfriends.org> wrote:
    > 
    >>
    >> On 22/10/2024 12:19, Tatsuo Ishii wrote:
    >>
    >> Acording to ISO/IEC 9075-2:2016 "4.21.2 Row pattern navigation operations",
    >>
    >>   <row pattern navigation operation> evaluates a <value expression> VE
    >>   in a row NR, which may be different than current row CR.
    >>
    >> From this I think PREV(col + 1) should be interpreted as:
    >>
    >> 1. go to the previous row.
    >> 2. evaluate "col + 1" at the current row (that was previous row).
    >> 3. return the result.
    >>
    >> If my understanding is correct, prev(price + 1) has the same meaning
    >> as prev(price) + 1.
    >>
    >>
    >>
    >> This is how I read the specification also.
    >>
    >>
    >>
    > That makes sense.  Definitely much nicer to only have to write PREV once if
    > the expression you are evaluating involves multiple columns.  And is also
    > consistent with window function "value" behavior.
    
    Thanks to all who joined the discussion. I decided to support PREV and
    NEXT in my RPR patches to allow to have multiple columns and other
    expressions in their argument. e.g.
    
    CREATE TEMP TABLE rpr1 (id INTEGER, i SERIAL, j INTEGER);
    INSERT INTO rpr1(id, j) SELECT 1, g*2 FROM generate_series(1, 10) AS g;
    SELECT id, i, j, count(*) OVER w
     FROM rpr1
     WINDOW w AS (
     PARTITION BY id
     ORDER BY i
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
     AFTER MATCH SKIP PAST LAST ROW
     INITIAL
     PATTERN (START COND+)
     DEFINE
      START AS TRUE,
      COND AS PREV(i + j + 1) < 10
    );
     id | i  | j  | count 
    ----+----+----+-------
      1 |  1 |  2 |     3
      1 |  2 |  4 |     0
      1 |  3 |  6 |     0
      1 |  4 |  8 |     0
      1 |  5 | 10 |     0
      1 |  6 | 12 |     0
      1 |  7 | 14 |     0
      1 |  8 | 16 |     0
      1 |  9 | 18 |     0
      1 | 10 | 20 |     0
    (10 rows)
    
    Attached are the v23 patches. V23 also includes the fix for the problem
    pointed out by Jacob Champion and test cases from him. Thank you, Jacob.
    https://www.postgresql.org/message-id/CAOYmi%2Bns3kHjC83ap_BCfJCL0wfO5BJ_sEByOEpgNOrsPhqQTg%40mail.gmail.com
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  92. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-12-19T06:19:50Z

    I have looked into the performance of current RPR implementation,
    especially when the number of rows in a reduced frame is large (like
    over 10k). Below is a simple benchmark against pgbench database. The
    SQL will produce a reduced frame having 10k rows.
    
    EXPLAIN (ANALYZE)
    SELECT aid, bid, count(*) OVER w
    FROM pgbench_accounts WHERE aid <= 10000
    WINDOW w AS (
    PARTITION BY bid
    ORDER BY aid
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    AFTER MATCH SKIP PAST LAST ROW
    INITIAL
    PATTERN (START UP+)
    DEFINE
    START AS TRUE,
    UP AS aid > PREV(aid)
    );
    
    This took 722 ms on my laptop. It's not very quick. Moreover, if I
    expand the reduced frame size to 100k (aid <= 100000), OOM killer
    triggered. I looked into the code and found that do_pattern_match in
    nodeWindowAgg.c is one of the major problems. It calls regexp_instr to
    know whether the regular expression derived from a PATTERN clause
    (e.g. "ab+c+") matches an encoded row pattern variable string
    (e.g. "abbcc"). The latter string could be quite long: the length
    could be as same as the number of rows in the reduced frame. Thus, The
    length could become 100k if the frame size is 100k. Unfortunately
    regexp_instr needs to allocate and convert the input string to wchar
    (it's 4-byte long for each character), which uses 4x space bigger than
    the original input string. In RPR case the input string is always
    ASCII and does not need to be converted to wchar. So I decided to
    switch to the standard regular expression engine coming with OS. With
    this change, I got 2x speed up in the 10k case.
    
    v23 patch: 722.618 ms (average of 3 runs)
    new patch: 322.5913 ms (average of 3 runs)
    
    Also I tried the 100k rows reduced frame case. It was slow (took 26
    seconds) but it completed without OOM killer.  Attached is the
    patch. The change was in 0005 only. Other patches were not changed
    from v23.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  93. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-12-21T09:20:04Z

    > I have looked into the performance of current RPR implementation,
    > especially when the number of rows in a reduced frame is large (like
    > over 10k). Below is a simple benchmark against pgbench database. The
    > SQL will produce a reduced frame having 10k rows.
    > 
    > EXPLAIN (ANALYZE)
    > SELECT aid, bid, count(*) OVER w
    > FROM pgbench_accounts WHERE aid <= 10000
    > WINDOW w AS (
    > PARTITION BY bid
    > ORDER BY aid
    > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > AFTER MATCH SKIP PAST LAST ROW
    > INITIAL
    > PATTERN (START UP+)
    > DEFINE
    > START AS TRUE,
    > UP AS aid > PREV(aid)
    > );
    > 
    > This took 722 ms on my laptop. It's not very quick. Moreover, if I
    > expand the reduced frame size to 100k (aid <= 100000), OOM killer
    > triggered. I looked into the code and found that do_pattern_match in
    > nodeWindowAgg.c is one of the major problems. It calls regexp_instr to
    > know whether the regular expression derived from a PATTERN clause
    > (e.g. "ab+c+") matches an encoded row pattern variable string
    > (e.g. "abbcc"). The latter string could be quite long: the length
    > could be as same as the number of rows in the reduced frame. Thus, The
    > length could become 100k if the frame size is 100k. Unfortunately
    > regexp_instr needs to allocate and convert the input string to wchar
    > (it's 4-byte long for each character), which uses 4x space bigger than
    > the original input string. In RPR case the input string is always
    > ASCII and does not need to be converted to wchar. So I decided to
    > switch to the standard regular expression engine coming with OS. With
    > this change, I got 2x speed up in the 10k case.
    > 
    > v23 patch: 722.618 ms (average of 3 runs)
    > new patch: 322.5913 ms (average of 3 runs)
    > 
    > Also I tried the 100k rows reduced frame case. It was slow (took 26
    > seconds) but it completed without OOM killer.  Attached is the
    > patch. The change was in 0005 only. Other patches were not changed
    > from v23.
    
    The CFBot starts complaining about the patch. It fails in Windows
    environment test because regex.h does not exist. I concluded that on
    Windows it's not a good idea to use the standard regexp library (I am
    not familiar with Windows. If my thought is not correct, please let me
    know). So I switched to the PostgreSQL's builtin core regexp
    library. Although the interface requires to use pg_wchar which spends
    4x memory comparing with the standard regexp (that's why I wanted to
    avoid using it), the result seems to be not so bad. It consumes only
    10MB or so more memory when processing 100k rows in a frame. Good news
    is, it runs slightly faster than the standard regexp (19 vs. 26
    seconds).
    
    Attached is the v25 patch to use the PostgreSQL's regexp library.
    Most changes are in nodeWindowAgg.c, which is in the 5th patch.
    
    In the patches I also fixed some memory leaks and run pgindent with
    updated typedefs.list. Now the patch includes a patch for
    typedefs.list (the 8th patch).
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  94. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-12-30T13:37:18Z

    I have added further optimization to the v25 patch.
    
    While generating possible input strings that may satisfy the pattern
    string, it is possible to omit to run regexp in some cases. Since
    regexp matching is heavy operation, especially if it is applied to
    long string, it is beneficial for RPR to reduce the number of regexp
    runs.
    
    If the tail pattern variable has '+' quantifier and previously the
    input string was confirmed to be matched the pattern string, and the
    same character as the tail pattern string is added, we don't need run
    regexp match again because the new input string surely matches the
    pattern string. Suppose a pattern string is "ab+" and the current
    input string is "ab" (this satisfies "ab+"). If the new input string
    is "abb", then "abb" surely matches "ab+" too and we don't need to run
    regexp again.
    
    In v26 patch, by using the technique above I get performance
    improvement.
    
    >> EXPLAIN (ANALYZE)
    >> SELECT aid, bid, count(*) OVER w
    >> FROM pgbench_accounts WHERE aid <= 10000
    >> WINDOW w AS (
    >> PARTITION BY bid
    >> ORDER BY aid
    >> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >> AFTER MATCH SKIP PAST LAST ROW
    >> INITIAL
    >> PATTERN (START UP+)
    >> DEFINE
    >> START AS TRUE,
    >> UP AS aid > PREV(aid)
    >> );
    
    This SQL took 322.5913 ms (average in 3 runs) in v24. With v26 patch,
    it takes 41.84 ms, which is over 7 times improvement. Also I run the
    SQL in 100k row case. v23 took 26 seconds. With the v26 patch it takes
    1195.603 ms, which is over 21 times improvement.
    
    I think a pattern string ended up with '+' is one of common use cases
    of RPR, and I believe the improvement is useful for many RPR
    applications.
    
    I also add following changes to v25.
    
    - Fix do_pattern_match to use the top memory context to store compiled
      re cache. Before it was in per query memory context. This causes a
      trouble because do_pattern_match checks the cache existence using
      a static variable.
    
    - Refactor search_str_set, which is a workhorse of pattern matching,
      into multiple functions to understand the logic easier.
      
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  95. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2024-12-30T23:57:07Z

    > I have added further optimization to the v25 patch.
    > 
    > While generating possible input strings that may satisfy the pattern
    > string, it is possible to omit to run regexp in some cases. Since
    > regexp matching is heavy operation, especially if it is applied to
    > long string, it is beneficial for RPR to reduce the number of regexp
    > runs.
    > 
    > If the tail pattern variable has '+' quantifier and previously the
    > input string was confirmed to be matched the pattern string, and the
    > same character as the tail pattern string is added, we don't need run
    > regexp match again because the new input string surely matches the
    > pattern string. Suppose a pattern string is "ab+" and the current
    > input string is "ab" (this satisfies "ab+"). If the new input string
    > is "abb", then "abb" surely matches "ab+" too and we don't need to run
    > regexp again.
    > 
    > In v26 patch, by using the technique above I get performance
    > improvement.
    > 
    >>> EXPLAIN (ANALYZE)
    >>> SELECT aid, bid, count(*) OVER w
    >>> FROM pgbench_accounts WHERE aid <= 10000
    >>> WINDOW w AS (
    >>> PARTITION BY bid
    >>> ORDER BY aid
    >>> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >>> AFTER MATCH SKIP PAST LAST ROW
    >>> INITIAL
    >>> PATTERN (START UP+)
    >>> DEFINE
    >>> START AS TRUE,
    >>> UP AS aid > PREV(aid)
    >>> );
    > 
    > This SQL took 322.5913 ms (average in 3 runs) in v24. With v26 patch,
    > it takes 41.84 ms, which is over 7 times improvement. Also I run the
    > SQL in 100k row case. v23 took 26 seconds. With the v26 patch it takes
    > 1195.603 ms, which is over 21 times improvement.
    > 
    > I think a pattern string ended up with '+' is one of common use cases
    > of RPR, and I believe the improvement is useful for many RPR
    > applications.
    > 
    > I also add following changes to v25.
    > 
    > - Fix do_pattern_match to use the top memory context to store compiled
    >   re cache. Before it was in per query memory context. This causes a
    >   trouble because do_pattern_match checks the cache existence using
    >   a static variable.
    > 
    > - Refactor search_str_set, which is a workhorse of pattern matching,
    >   into multiple functions to understand the logic easier.
    
    CFBot complains a compiler error in the v26 patch.
    Attached is v27 patch to fix this. Also some typo in comment are fixed.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  96. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-01-11T05:46:11Z

    Attached are the v28 patches to implement a subset of Row Pattern
    Recognition feature defined in the SQL standard.
    
    In this patch set:
    
    - Reduce the patch size. Comparing with v27, the patch size is trimmed
      from 5296 lines down to 5073 lines. This is mainly achieved in the
      raw parser patch so that non implemented features are removed from
      the grammar file.
    
    - Use newly introduced makeStringInfoExt() instead of makeStringInfo()
      in nodeWindowAgg.c, which removes a few unnecessary codes and should
      give slightly better performance.
    
    - Fix bugs in the doc and add it more description regarding RPR
      syntax.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  97. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-03-13T12:51:29Z

    Attached are the v29 patch sets to implement the subset of row pattern
    recognition feature defined in the SQL standard.
    
    In this patch set:
    
    - Adjust 0003 and 0004 to deal with the recent changes made by 8b1b342.
     
    - Add tests to 0007 for CREATE VIEW/pg_get_viewdef.
    
    Best reagards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  98. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-04-22T23:55:33Z

    Attached are the v30 patches, just adding a patch to change the
    default io_method parameter to sync, expecting this affects something
    to the recent CFbot failure.
    https://commitfest.postgresql.org/patch/4460/
    https://cirrus-ci.com/task/6078653164945408
    which is similar to:
    https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org
    
    (Once the issue resolved, the patch should be removed, of course)
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  99. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-04-23T01:07:29Z

    > Attached are the v30 patches, just adding a patch to change the
    > default io_method parameter to sync, expecting this affects something
    > to the recent CFbot failure.
    > https://commitfest.postgresql.org/patch/4460/
    > https://cirrus-ci.com/task/6078653164945408
    > which is similar to:
    > https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org
    
    CFBot has just run tests against v30 patches and now it turns to green
    again!  I guess io_method = sync definitely did the trick.  Note that
    previously only the Windows Server 2019, VS 2019 - Neason & ninja test
    had failed.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  100. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-05-06T00:53:51Z

    > Attached are the v30 patches, just adding a patch to change the
    > default io_method parameter to sync, expecting this affects something
    > to the recent CFbot failure.
    > https://commitfest.postgresql.org/patch/4460/
    > https://cirrus-ci.com/task/6078653164945408
    > which is similar to:
    > https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org
    > 
    > (Once the issue resolved, the patch should be removed, of course)
    
    It seems this has turned to green since May 2, 2025.
    https://commitfest.postgresql.org/patch/5679/.
    
    The last time it turned to red was April 16, 2025. Maybe something
    committed between the period solved the cause of red, but I don't know
    exactly which commit.
    
    Anyway v31 patches now remove the change to default io_method
    parameter to see if it passes Windows Server 2019, VS 2019 - Meson &
    ninja test.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  101. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-05-06T07:28:42Z

    >> Attached are the v30 patches, just adding a patch to change the
    >> default io_method parameter to sync, expecting this affects something
    >> to the recent CFbot failure.
    >> https://commitfest.postgresql.org/patch/4460/
    >> https://cirrus-ci.com/task/6078653164945408
    >> which is similar to:
    >> https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org
    >> 
    >> (Once the issue resolved, the patch should be removed, of course)
    > 
    > It seems this has turned to green since May 2, 2025.
    > https://commitfest.postgresql.org/patch/5679/.
    > 
    > The last time it turned to red was April 16, 2025. Maybe something
    > committed between the period solved the cause of red, but I don't know
    > exactly which commit.
    > 
    > Anyway v31 patches now remove the change to default io_method
    > parameter to see if it passes Windows Server 2019, VS 2019 - Meson &
    > ninja test.
    
    Now I see it passes the test.
    https://cirrus-ci.com/build/5671612202090496
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  102. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-08-16T08:45:52Z

    Attached are the v32 patches for Row pattern recognition.  Recent
    changes to doc/src/sgml/func.sgml required v31 to be rebased. Other
    than that, nothing has been changed.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  103. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-09-24T10:35:23Z

    Attached are the v33 patches for Row pattern recognition.  The
    difference from v32 is that the raw parse tree printing patch is not
    included in v33. This is because now that we have
    debug_print_raw_parse.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  104. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-17T06:57:15Z

    Attached are the v34 patches for Row pattern recognition.
    Notihing has been changed. Just rebased.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  105. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-18T02:33:20Z

    Attached are the v35 patches for Row pattern recognition.  In v34-0001
    gram.y patch, %type for RPR were misplaced. v35-0001 fixes this. Other
    patches are not changed.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  106. Re: Row pattern recognition

    Chao Li <li.evan.chao@gmail.com> — 2025-11-18T05:03:08Z

    Hi Tatsuo-san,
    
    I just reviewed 0006 docs changes and 0001. I plan to slowly review the patch, maybe one commit a day. 
    
    > On Nov 18, 2025, at 10:33, Tatsuo Ishii <ishii@postgresql.org> wrote:
    > 
    > Attached are the v35 patches for Row pattern recognition.  In v34-0001
    > gram.y patch, %type for RPR were misplaced. v35-0001 fixes this. Other
    > patches are not changed.
    > 
    > Best regards,
    > --
    > Tatsuo Ishii
    > SRA OSS K.K.
    > English: http://www.sraoss.co.jp/index_en/
    > Japanese:http://www.sraoss.co.jp
    > <v35-0001-Row-pattern-recognition-patch-for-raw-parser.patch><v35-0002-Row-pattern-recognition-patch-parse-analysis.patch><v35-0003-Row-pattern-recognition-patch-rewriter.patch><v35-0004-Row-pattern-recognition-patch-planner.patch><v35-0005-Row-pattern-recognition-patch-executor.patch><v35-0006-Row-pattern-recognition-patch-docs.patch><v35-0007-Row-pattern-recognition-patch-tests.patch><v35-0008-Row-pattern-recognition-patch-typedefs.list.patch>
    
    I got a few comments, maybe just questions:
    
    1 - 0001 - kwlist.h
    ```
    +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
    ```
    
    Why do we add “define” as a reserved keyword? From the SQL example you put in 0006:
    ```
    <programlisting>
    SELECT company, tdate, price,
     first_value(price) OVER w,
     max(price) OVER w,
     count(price) OVER w
    FROM stock
     WINDOW w AS (
     PARTITION BY company
     ORDER BY tdate
     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
     AFTER MATCH SKIP PAST LAST ROW
     INITIAL
     PATTERN (LOWPRICE UP+ DOWN+)
     DEFINE
      LOWPRICE AS price &lt;= 100,
      UP AS price &gt; PREV(price),
      DOWN AS price &lt; PREV(price)
    );
    </programlisting>
    ```
    
    PARTITION is at the same level as DEFINE, but it’s not defined as a reserved keyword:
    ```
    PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
    ```
    
    Even in this patch,”initial”,”past”, “pattern” and “seek” are defined as unreserved, why?  
    
    So I just want to clarify.
    
    2 - 0001 - gram.y
    ```
    opt_row_pattern_initial_or_seek:
    			INITIAL		{ $$ = true; }
    			| SEEK
    				{
    					ereport(ERROR,
    							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    							 errmsg("SEEK is not supported"),
    							 errhint("Use INITIAL instead."),
    							 parser_errposition(@1)));
    				}
    			| /*EMPTY*/		{ $$ = true; }
    		;
    ```
    
    As SEEK is specially listed here, I guess it might be supported in future. If that is true, would it be better to defer the semantic check to later parse phase, which would future work easier.
    
    3 - 0001 - parsenodes.h
    ```
    +/*
    + * RowPatternCommonSyntax - raw representation of row pattern common syntax
    + *
    + */
    +typedef struct RPCommonSyntax
    +{
    +	NodeTag		type;
    +	RPSkipTo	rpSkipTo;		/* Row Pattern AFTER MATCH SKIP type */
    +	bool		initial;		/* true if <row pattern initial or seek> is
    +								 * initial */
    +	List	   *rpPatterns;		/* PATTERN variables (list of A_Expr) */
    +	List	   *rpDefs;			/* row pattern definitions clause (list of
    +								 * ResTarget) */
    +} RPCommonSyntax;
    +
     /*
      * WindowDef - raw representation of WINDOW and OVER clauses
      *
    @@ -593,6 +618,7 @@ typedef struct WindowDef
     	char	   *refname;		/* referenced window name, if any */
     	List	   *partitionClause;	/* PARTITION BY expression list */
     	List	   *orderClause;	/* ORDER BY (list of SortBy) */
    +	RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
    ```
    
    RP fields are directly defined in WindowClause, then why do we need a wrapper of RPCommonSyntax? Can we directly define RP fields in WindowRef as well?
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  107. Re: Row pattern recognition

    Chao Li <li.evan.chao@gmail.com> — 2025-11-18T05:05:52Z

    
    > On Nov 18, 2025, at 13:03, Chao Li <li.evan.chao@gmail.com> wrote:
    > 
    > Hi Tatsuo-san,
    > 
    > I just reviewed 0006 docs changes and 0001. I plan to slowly review the patch, maybe one commit a day. 
    > 
    >> On Nov 18, 2025, at 10:33, Tatsuo Ishii <ishii@postgresql.org> wrote:
    >> 
    >> Attached are the v35 patches for Row pattern recognition.  In v34-0001
    >> gram.y patch, %type for RPR were misplaced. v35-0001 fixes this. Other
    >> patches are not changed.
    >> 
    >> Best regards,
    >> --
    >> Tatsuo Ishii
    >> SRA OSS K.K.
    >> English: http://www.sraoss.co.jp/index_en/
    >> Japanese:http://www.sraoss.co.jp
    >> <v35-0001-Row-pattern-recognition-patch-for-raw-parser.patch><v35-0002-Row-pattern-recognition-patch-parse-analysis.patch><v35-0003-Row-pattern-recognition-patch-rewriter.patch><v35-0004-Row-pattern-recognition-patch-planner.patch><v35-0005-Row-pattern-recognition-patch-executor.patch><v35-0006-Row-pattern-recognition-patch-docs.patch><v35-0007-Row-pattern-recognition-patch-tests.patch><v35-0008-Row-pattern-recognition-patch-typedefs.list.patch>
    > 
    > I got a few comments, maybe just questions:
    > 
    > 1 - 0001 - kwlist.h
    > ```
    > +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
    > ```
    > 
    > Why do we add “define” as a reserved keyword? From the SQL example you put in 0006:
    > ```
    > <programlisting>
    > SELECT company, tdate, price,
    > first_value(price) OVER w,
    > max(price) OVER w,
    > count(price) OVER w
    > FROM stock
    > WINDOW w AS (
    > PARTITION BY company
    > ORDER BY tdate
    > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    > AFTER MATCH SKIP PAST LAST ROW
    > INITIAL
    > PATTERN (LOWPRICE UP+ DOWN+)
    > DEFINE
    >  LOWPRICE AS price &lt;= 100,
    >  UP AS price &gt; PREV(price),
    >  DOWN AS price &lt; PREV(price)
    > );
    > </programlisting>
    > ```
    > 
    > PARTITION is at the same level as DEFINE, but it’s not defined as a reserved keyword:
    > ```
    > PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
    > ```
    > 
    > Even in this patch,”initial”,”past”, “pattern” and “seek” are defined as unreserved, why?  
    > 
    > So I just want to clarify.
    > 
    > 2 - 0001 - gram.y
    > ```
    > opt_row_pattern_initial_or_seek:
    > INITIAL { $$ = true; }
    > | SEEK
    > {
    > ereport(ERROR,
    > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    > errmsg("SEEK is not supported"),
    > errhint("Use INITIAL instead."),
    > parser_errposition(@1)));
    > }
    > | /*EMPTY*/ { $$ = true; }
    > ;
    > ```
    > 
    > As SEEK is specially listed here, I guess it might be supported in future. If that is true, would it be better to defer the semantic check to later parse phase, which would future work easier.
    > 
    > 3 - 0001 - parsenodes.h
    > ```
    > +/*
    > + * RowPatternCommonSyntax - raw representation of row pattern common syntax
    > + *
    > + */
    > +typedef struct RPCommonSyntax
    > +{
    > + NodeTag type;
    > + RPSkipTo rpSkipTo; /* Row Pattern AFTER MATCH SKIP type */
    > + bool initial; /* true if <row pattern initial or seek> is
    > + * initial */
    > + List   *rpPatterns; /* PATTERN variables (list of A_Expr) */
    > + List   *rpDefs; /* row pattern definitions clause (list of
    > + * ResTarget) */
    > +} RPCommonSyntax;
    > +
    > /*
    >  * WindowDef - raw representation of WINDOW and OVER clauses
    >  *
    > @@ -593,6 +618,7 @@ typedef struct WindowDef
    > char   *refname; /* referenced window name, if any */
    > List   *partitionClause; /* PARTITION BY expression list */
    > List   *orderClause; /* ORDER BY (list of SortBy) */
    > + RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
    > ```
    > 
    > RP fields are directly defined in WindowClause, then why do we need a wrapper of RPCommonSyntax? Can we directly define RP fields in WindowRef as well?
    > 
    
    4 - 0001 - parsenodes.h
    ```
    +	/* Row Pattern AFTER MACH SKIP clause */
    ```
    
    Typo: MACH -> MATCH
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  108. Re: Row pattern recognition

    Vik Fearing <vik@postgresfriends.org> — 2025-11-18T11:19:46Z

    On 18/11/2025 06:03, Chao Li wrote:
    > 1 - 0001 - kwlist.h
    > ```
    > +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
    > ```
    >
    > Why do we add “define” as a reserved keyword? From the SQL example you put in 0006:
    > ```
    > <programlisting>
    > SELECT company, tdate, price,
    >   first_value(price) OVER w,
    >   max(price) OVER w,
    >   count(price) OVER w
    > FROM stock
    >   WINDOW w AS (
    >   PARTITION BY company
    >   ORDER BY tdate
    >   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >   AFTER MATCH SKIP PAST LAST ROW
    >   INITIAL
    >   PATTERN (LOWPRICE UP+ DOWN+)
    >   DEFINE
    >    LOWPRICE AS price &lt;= 100,
    >    UP AS price &gt; PREV(price),
    >    DOWN AS price &lt; PREV(price)
    > );
    > </programlisting>
    > ```
    >
    > PARTITION is at the same level as DEFINE, but it’s not defined as a reserved keyword:
    > ```
    > PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
    > ```
    >
    > Even in this patch,”initial”,”past”, “pattern” and “seek” are defined as unreserved, why?
    >
    > So I just want to clarify.
    
    
    Because of position. Without making DEFINE a reserved keyword, how do 
    you know that it isn't another variable in the PATTERN clause?
    
    -- 
    
    Vik Fearing
    
    
    
    
    
  109. Re: Row pattern recognition

    Chao Li <li.evan.chao@gmail.com> — 2025-11-19T04:14:03Z

    
    > On Nov 18, 2025, at 19:19, Vik Fearing <vik@postgresfriends.org> wrote:
    > 
    > 
    > Because of position. Without making DEFINE a reserved keyword, how do you know that it isn't another variable in the PATTERN clause?
    > 
    
    Ah, thanks for the clarification. Now I got it.
    
    I’m continue to review 0002.
    
    5 - 0002 - parse_clause.c
    ```
    +		ereport(ERROR,
    +				(errcode(ERRCODE_SYNTAX_ERROR),
    +				 errmsg("FRAME must start at current row when row patttern recognition is used")));
    ```
    
    Nit typo: patttern -> pattern
    
    6 - 0002 - parse_clause.c
    ```
    +	/* DEFINE variable name initials */
    +	static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz”;
    ```
    
    This string can also be const, so “static const char *”
    
    7 - 0002 - parse_clause.c
    ```
    +	/*
    +	 * Create list of row pattern DEFINE variable name's initial. We assign
    +	 * [a-z] to them (up to 26 variable names are allowed).
    +	 */
    +	restargets = NIL;
    +	i = 0;
    +	initialLen = strlen(defineVariableInitials);
    +
    +	foreach(lc, windef->rpCommonSyntax->rpDefs)
    +	{
    +		char		initial[2];
    +
    +		restarget = (ResTarget *) lfirst(lc);
    +		name = restarget->name;
    ```
    
    Looks like “name” is not used inside “foreach”.
    
    8 - 0002 - parse_clause.c
    ```
    +	/*
    +	 * Create list of row pattern DEFINE variable name's initial. We assign
    +	 * [a-z] to them (up to 26 variable names are allowed).
    +	 */
    +	restargets = NIL;
    +	i = 0;
    +	initialLen = strlen(defineVariableInitials);
    +
    +	foreach(lc, windef->rpCommonSyntax->rpDefs)
    +	{
    +		char		initial[2];
    ```
    
    I guess this “foreach” for build initial list can be combined into the previous foreach loop of checking duplication. If an def has no dup, then assign an initial to it.
    
    9 - 0002 - parse_clause.c
    ```
    +static void
    +transformPatternClause(ParseState *pstate, WindowClause *wc,
    +					   WindowDef *windef)
    +{
    +	ListCell   *lc;
    +
    +	/*
    +	 * Row Pattern Common Syntax clause exists?
    +	 */
    +	if (windef->rpCommonSyntax == NULL)
    +		return;
    ```
    
    Checking  (windef->rpCommonSyntax == NULL) seems a duplicate, because transformPatternClause() is only called by transformRPR(), and transformRPR() has done the check.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  110. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-19T06:51:02Z

    > On 18/11/2025 06:03, Chao Li wrote:
    >> 1 - 0001 - kwlist.h
    >> ```
    >> +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
    >> ```
    >>
    >> Why do we add “define” as a reserved keyword? From the SQL example
    >> you put in 0006:
    >> ```
    >> <programlisting>
    >> SELECT company, tdate, price,
    >>   first_value(price) OVER w,
    >>   max(price) OVER w,
    >>   count(price) OVER w
    >> FROM stock
    >>   WINDOW w AS (
    >>   PARTITION BY company
    >>   ORDER BY tdate
    >>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >>   AFTER MATCH SKIP PAST LAST ROW
    >>   INITIAL
    >>   PATTERN (LOWPRICE UP+ DOWN+)
    >>   DEFINE
    >>    LOWPRICE AS price &lt;= 100,
    >>    UP AS price &gt; PREV(price),
    >>    DOWN AS price &lt; PREV(price)
    >> );
    >> </programlisting>
    >> ```
    >>
    >> PARTITION is at the same level as DEFINE, but it’s not defined as a
    >> reserved keyword:
    >> ```
    >> PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
    >> ```
    >>
    >> Even in this patch,”initial”,”past”, “pattern” and “seek” are
    >> defined as unreserved, why?
    >>
    >> So I just want to clarify.
    > 
    > 
    > Because of position. Without making DEFINE a reserved keyword, how do
    > you know that it isn't another variable in the PATTERN clause?
    
    I think we don't need to worry about this because PATTERN_P is in the
    $nonassoc list in the patch, which gives PATTERN different precedence
    from DEFINE.
    
    @@ -888,6 +896,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
     %nonassoc	UNBOUNDED NESTED /* ideally would have same precedence as IDENT */
     %nonassoc	IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP
     			SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT PATH
    +			AFTER INITIAL SEEK PATTERN_P
    
    And I think we could change DEFINE to an unreserved keyword.  Attached
    is a patch to do that, on top of v35-0001.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  111. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-19T07:19:03Z

    > 2 - 0001 - gram.y
    > ```
    > opt_row_pattern_initial_or_seek:
    > 			INITIAL		{ $$ = true; }
    > 			| SEEK
    > 				{
    > 					ereport(ERROR,
    > 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    > 							 errmsg("SEEK is not supported"),
    > 							 errhint("Use INITIAL instead."),
    > 							 parser_errposition(@1)));
    > 				}
    > 			| /*EMPTY*/		{ $$ = true; }
    > 		;
    > ```
    > 
    > As SEEK is specially listed here, I guess it might be supported in future. If that is true, would it be better to defer the semantic check to later parse phase, which would future work easier.
    
    From a comment in gram.y:
    
    /*
     * If we see PARTITION, RANGE, ROWS, GROUPS, AFTER, INITIAL, SEEK or PATTERN_P
     * as the first token after the '(' of a window_specification, we want the
     * assumption to be that there is no existing_window_name; but those keywords
     * are unreserved and so could be ColIds.  We fix this by making them have the
    
    For this purpose, we want INITIAL and SEEK to be unreserved keywords.
    
    > 3 - 0001 - parsenodes.h
    > ```
    > +/*
    > + * RowPatternCommonSyntax - raw representation of row pattern common syntax
    > + *
    > + */
    > +typedef struct RPCommonSyntax
    > +{
    > +	NodeTag		type;
    > +	RPSkipTo	rpSkipTo;		/* Row Pattern AFTER MATCH SKIP type */
    > +	bool		initial;		/* true if <row pattern initial or seek> is
    > +								 * initial */
    > +	List	   *rpPatterns;		/* PATTERN variables (list of A_Expr) */
    > +	List	   *rpDefs;			/* row pattern definitions clause (list of
    > +								 * ResTarget) */
    > +} RPCommonSyntax;
    > +
    >  /*
    >   * WindowDef - raw representation of WINDOW and OVER clauses
    >   *
    > @@ -593,6 +618,7 @@ typedef struct WindowDef
    >  	char	   *refname;		/* referenced window name, if any */
    >  	List	   *partitionClause;	/* PARTITION BY expression list */
    >  	List	   *orderClause;	/* ORDER BY (list of SortBy) */
    > +	RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
    > ```
    > 
    > RP fields are directly defined in WindowClause, then why do we need a wrapper of RPCommonSyntax? Can we directly define RP fields in WindowRef as well?
    
    The row pattern common syntax defined in the standard is not only used
    in the WINDOW clause, but in the FROM clause. If we would support RPR
    in FROM clause in the future, it would be better to use the same code
    of row pattern common syntax in WINDOW clause as much as
    possible. That's the reason I created RPCommonSyntax. In the
    parse/analysis phase, I am not sure how the parse/analysis code would
    be in FROM clause at this point. So I did not define yet another
    struct for it.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  112. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-19T07:20:46Z

    > 4 - 0001 - parsenodes.h
    > ```
    > +	/* Row Pattern AFTER MACH SKIP clause */
    > ```
    > 
    > Typo: MACH -> MATCH
    
    Thanks for pointing it out. Will fix.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  113. Re: Row pattern recognition

    Chao Li <li.evan.chao@gmail.com> — 2025-11-20T07:33:48Z

    
    > On Nov 19, 2025, at 12:14, Chao Li <li.evan.chao@gmail.com> wrote:
    > 
    > 
    > 9 - 0002 - parse_clause.c
    
    I am continuing to review 0003
    
    10 - 0003
    ```
    +	Assert(list_length(patternVariables) == list_length(defineClause));
    ```
    
    Is this assert true? From what I learned, pattern length doesn’t have to equal to define length. For example, I got an example from [1]:
    
    ```
    Example 4
    -- This example has three different patterns.
    -- Comment two of them, to get error-free query.
    SELECT company, price_date, price
      FROM stock_price
        MATCH_RECOGNIZE (
           partition by company
           order by price_date
           all rows per match
           pattern ( limit_50  up   up* down  down*  )
           define
               limit_50 as price <= 50.00,
               up   as price > prev(price),
               down as price < prev(price)
        )
       WHERE company = 'B'
       ORDER BY price_date;
    ```
    
    In this example, pattern has 5 elements and define has only 3 elements.
    
    11 - 0004 - plannodes.h
    ```
    +	/* Row Pattern Recognition AFTER MACH SKIP clause */
    +	RPSkipTo	rpSkipTo;		/* Row Pattern Skip To type */
    ```
    
    Typo: MACH -> MATCH
    
    I’d stop here, and continue 0005 tomorrow.
    
    
    [1] https://link.springer.com/article/10.1007/s13222-022-00404-3
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  114. Re: Row pattern recognition

    Chao Li <li.evan.chao@gmail.com> — 2025-11-21T05:25:50Z

    
    > On Nov 20, 2025, at 15:33, Chao Li <li.evan.chao@gmail.com> wrote:
    > 
    > 
    > I’d stop here, and continue 0005 tomorrow.
    > 
    
    I reviewed 0005 today. I'm still not very familiar with the executor code, so going through 0005 has also been a valuable learning process for me.
    
    12 - 0005 - nodeWindowAgg.c
    ```
    +	if (string_set_get_size(str_set) == 0)
    +	{
    +		/* no match found in the first row */
    +		register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED);
    +		destroyStringInfo(encoded_str);
    +		return;
    +	}
    ```
    
    encoded_str should also be destroyed if not entering the “if” clause.
    
    13 - 0005 - nodeWindowAgg.c
    ```
    static
    int
    do_pattern_match(char *pattern, char *encoded_str, int len)
    {
    static regex_t *regcache = NULL;
    static regex_t preg;
    static char patbuf[1024]; /* most recent 'pattern' is cached here */
    ```
    
    Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
    
    14 - 0005 - nodeWindowAgg.c
    ```
    +		MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
    +
    +		if (regcache != NULL)
    +			pg_regfree(regcache);	/* free previous re */
    +
    +		/* we need to convert to char to pg_wchar */
    +		plen = strlen(pattern);
    +		data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
    +		data_len = pg_mb2wchar_with_len(pattern, data, plen);
    +		/* compile re */
    +		sts = pg_regcomp(&preg, /* compiled re */
    +						 data,	/* target pattern */
    +						 data_len,	/* length of pattern */
    +						 cflags,	/* compile option */
    +						 C_COLLATION_OID	/* collation */
    +			);
    +		pfree(data);
    +
    +		MemoryContextSwitchTo(oldContext);
    ```
    
    Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
    ```
    /*
    * pg_regcomp - compile regular expression
    *
    * Note: on failure, no resources remain allocated, so pg_regfree()
    * need not be applied to re.
    */
    int
    pg_regcomp(regex_t *re,
    const chr *string,
    size_t len,
    int flags,
    Oid collation)
    ```
    
    “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
    
    Okay, I’d stop here and continue to review 0006 next week.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  115. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-21T05:57:08Z

    Hi Chao,
    
    >> On Nov 18, 2025, at 19:19, Vik Fearing <vik@postgresfriends.org> wrote:
    >> 
    >> 
    >> Because of position. Without making DEFINE a reserved keyword, how do you know that it isn't another variable in the PATTERN clause?
    >> 
    > 
    > Ah, thanks for the clarification. Now I got it.
    > 
    > I’m continue to review 0002.
    
    Thanks for the review!
    
    > 5 - 0002 - parse_clause.c
    > ```
    > +		ereport(ERROR,
    > +				(errcode(ERRCODE_SYNTAX_ERROR),
    > +				 errmsg("FRAME must start at current row when row patttern recognition is used")));
    > ```
    > 
    > Nit typo: patttern -> pattern
    
    Will fix (this involves regression test change because this changes
    the ERROR messages in the expected file).
    
    > 6 - 0002 - parse_clause.c
    > ```
    > +	/* DEFINE variable name initials */
    > +	static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz”;
    > ```
    > 
    > This string can also be const, so “static const char *”
    
    Agreed. Will fix.
    
    > 7 - 0002 - parse_clause.c
    > ```
    > +	/*
    > +	 * Create list of row pattern DEFINE variable name's initial. We assign
    > +	 * [a-z] to them (up to 26 variable names are allowed).
    > +	 */
    > +	restargets = NIL;
    > +	i = 0;
    > +	initialLen = strlen(defineVariableInitials);
    > +
    > +	foreach(lc, windef->rpCommonSyntax->rpDefs)
    > +	{
    > +		char		initial[2];
    > +
    > +		restarget = (ResTarget *) lfirst(lc);
    > +		name = restarget->name;
    > ```
    > 
    > Looks like “name” is not used inside “foreach”.
    
    Oops. Will fix.
    
    > 8 - 0002 - parse_clause.c
    > ```
    > +	/*
    > +	 * Create list of row pattern DEFINE variable name's initial. We assign
    > +	 * [a-z] to them (up to 26 variable names are allowed).
    > +	 */
    > +	restargets = NIL;
    > +	i = 0;
    > +	initialLen = strlen(defineVariableInitials);
    > +
    > +	foreach(lc, windef->rpCommonSyntax->rpDefs)
    > +	{
    > +		char		initial[2];
    > ```
    > 
    > I guess this “foreach” for build initial list can be combined into the previous foreach loop of checking duplication. If an def has no dup, then assign an initial to it.
    
    You are right. Will change.
    
    > 9 - 0002 - parse_clause.c
    > ```
    > +static void
    > +transformPatternClause(ParseState *pstate, WindowClause *wc,
    > +					   WindowDef *windef)
    > +{
    > +	ListCell   *lc;
    > +
    > +	/*
    > +	 * Row Pattern Common Syntax clause exists?
    > +	 */
    > +	if (windef->rpCommonSyntax == NULL)
    > +		return;
    > ```
    > 
    > Checking  (windef->rpCommonSyntax == NULL) seems a duplicate, because transformPatternClause() is only called by transformRPR(), and transformRPR() has done the check.
    
    Yeah. transformDefineClause() already does the similar check using
    Assert. What about using Assert in transPatternClause() as well?
    
    	Assert(windef->rpCommonSyntax != NULL);
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  116. Re: Row pattern recognition

    Chao Li <li.evan.chao@gmail.com> — 2025-11-24T00:59:57Z

    
    > On Nov 21, 2025, at 13:25, Chao Li <li.evan.chao@gmail.com> wrote:
    > 
    > 
    > Okay, I’d stop here and continue to review 0006 next week.
    > 
    
    I just finished reviewing 0006, and see some problems:
    
    15 - 0006 - select.sgml
    ```
    +[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
    ```
    
    row_pattern_common_syntax doesn’t look like a good name. I searched over the docs, and couldn't find a name suffixed by “_syntax”. I think we can just name it as “row_pattern_recognition_clause” or a shorter name just “row_pattern”.
    
    16 - 0006 - select.sgml
    ```
    + <synopsis>
    + [ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ]
    ``
    
    I think the two values are mutually exclusive, so curly braces should added surround them:
    
    [ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]
    
    [] means optional, and {} means choose one from all alternatives.
    
    17 - 0006 - select.sgml
    ```
    PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...]
    ```
    
    PATTERN definition should be placed inside (), here you missed ()
    
    18 - 0006 - select.sgml
    The same code as 17, <replaceable class="parameter">pattern_variable_name</replaceable>[+], do you only put “+” here, I think SQL allows a lot of pattern quantifiers. From 0001, gram.y, looks like +, * and ? Are supported in this patch. So, maybe we can do:
    
    ```
    PATTERN ( pattern_element, [pattern_element …] )
    
    and pattern_element = variable name optionally followed by quantifier (reference to somewhere introducing pattern quantifier).
    ```
    
    19 - 0006 - select.sgml
    
    I don’t see INITIAL and SEEK are described. Even if SEEK is not supported currently, it’s worthy mentioning that.
    
    20 - 0006 - select.sgml
    ```
    +    after a match found. With <literal>AFTER MATCH SKIP PAST LAST
    +    ROW</literal> (the default) next row position is next to the last row of
    ```
    
    21 - 0006 - select.sgml
    ```
     [ <replaceable class="parameter">frame_clause</replaceable> ]
    +[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
    ```
    
    Looks like row_pattern_common_syntax belongs to frame_clause, and you have lately added row_pattern_common_syntax to frame_clause:
    ```
     <synopsis>
    -{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
    -{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
    +{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
    +{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
     </synopsis>
    ```
    
    So I guess row_pattern_common_syntax after frame_clause should not be added.
    
    22 - 0006 - advance.sgml
    ```
    <programlisting>
    DEFINE
     LOWPRICE AS price &lt;= 100,
     UP AS price &gt; PREV(price),
     DOWN AS price &lt; PREV(price)
    </programlisting>
    
        Note that <function>PREV</function> returns the price column in the
    ```
    
    In the “Note” line, “price” refers to a column from above example, so I think it should be tagged by <literal>. This comment applies to multiple places.
    
    I will proceed 0007 tomorrow.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  117. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-24T04:57:25Z

    Hi Chao,
    
    Thank you for the review!
    
    >> On Nov 20, 2025, at 15:33, Chao Li <li.evan.chao@gmail.com> wrote:
    >> 
    >> 
    >> I’d stop here, and continue 0005 tomorrow.
    >> 
    > 
    > I reviewed 0005 today. I'm still not very familiar with the executor code, so going through 0005 has also been a valuable learning process for me.
    > 
    > 12 - 0005 - nodeWindowAgg.c
    > ```
    > +	if (string_set_get_size(str_set) == 0)
    > +	{
    > +		/* no match found in the first row */
    > +		register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED);
    > +		destroyStringInfo(encoded_str);
    > +		return;
    > +	}
    > ```
    > 
    > encoded_str should also be destroyed if not entering the “if” clause.
    
    Subsequent string_set_discard() will free encode_str since encoded_str
    is part of str_set. So no need to call destroyStringInfo(encoded_str)
    in not entering "if" clause.
    
    	string_set_discard(str_set);
    
    So I think this is Ok.
    
    > 13 - 0005 - nodeWindowAgg.c
    > ```
    > static
    > int
    > do_pattern_match(char *pattern, char *encoded_str, int len)
    > {
    > static regex_t *regcache = NULL;
    > static regex_t preg;
    > static char patbuf[1024]; /* most recent 'pattern' is cached here */
    > ```
    > 
    > Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
    
    Yeah, such a usage maybe rare. But I hesitate to store the data
    (regex_t) in WindowAggState because:
    
    (1) it bloats WindowAggState which we want to avoid if possible
    (2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)
    
    > 14 - 0005 - nodeWindowAgg.c
    > ```
    > +		MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
    > +
    > +		if (regcache != NULL)
    > +			pg_regfree(regcache);	/* free previous re */
    > +
    > +		/* we need to convert to char to pg_wchar */
    > +		plen = strlen(pattern);
    > +		data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
    > +		data_len = pg_mb2wchar_with_len(pattern, data, plen);
    > +		/* compile re */
    > +		sts = pg_regcomp(&preg, /* compiled re */
    > +						 data,	/* target pattern */
    > +						 data_len,	/* length of pattern */
    > +						 cflags,	/* compile option */
    > +						 C_COLLATION_OID	/* collation */
    > +			);
    > +		pfree(data);
    > +
    > +		MemoryContextSwitchTo(oldContext);
    > ```
    > 
    > Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
    > ```
    > /*
    > * pg_regcomp - compile regular expression
    > *
    > * Note: on failure, no resources remain allocated, so pg_regfree()
    > * need not be applied to re.
    > */
    > int
    > pg_regcomp(regex_t *re,
    > const chr *string,
    > size_t len,
    > int flags,
    > Oid collation)
    > ```
    > 
    > “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
    
    I admit current patch leaves the memory unfreed even after a query
    finishes. What about adding something like:
    
    static void do_pattern_match_end(void)
    {
    	if (regcache != NULL)
    		pg_regfree(regcache);
    }
    
    and let ExecEndWindowAgg() call it?
    
    > Okay, I’d stop here and continue to review 0006 next week.
    > 
    > Best regards,
    > --
    > Chao Li (Evan)
    > HighGo Software Co., Ltd.
    > https://www.highgo.com/
    > 
    > 
    > 
    > 
    
    
    
    
  118. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-24T06:47:03Z

    Hi Chao,
    
    >> On Nov 21, 2025, at 13:25, Chao Li <li.evan.chao@gmail.com> wrote:
    >> 
    >> 
    >> Okay, I’d stop here and continue to review 0006 next week.
    >> 
    > 
    > I just finished reviewing 0006, and see some problems:
    > 
    > 15 - 0006 - select.sgml
    > ```
    > +[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
    > ```
    > 
    > row_pattern_common_syntax doesn’t look like a good name. I searched over the docs, and couldn't find a name suffixed by “_syntax”. I think we can just name it as “row_pattern_recognition_clause” or a shorter name just “row_pattern”.
    
    I believe these names are based on the SQL standard. All syntax
    components do not necessary be suffixed by "clause". For example
    in select.sgml,
    
    [ WITH [ RECURSIVE ] <replaceable class="parameter">with_query</replaceable> [, ...] ]
    SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replaceable> [, ...] ) ] ]
        [ { * | <replaceable class="parameter">expression</replaceable> [ [ AS ] <replaceable class="parameter">output_name</replaceable> ] } [, ...] ]
        [ FROM <replaceable class="parameter">from_item</replaceable> [, ...] ]
        [ WHERE <replaceable class="parameter">condition</replaceable> ]
        [ GROUP BY { ALL | [ ALL | DISTINCT ] <replaceable class="parameter">grouping_element</replaceable> [, ...] } ]
        [ HAVING <replaceable class="parameter">condition</replaceable> ]
        [ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceable class="parameter">window_definition</replaceable> ) [, ...] ]
    
    "window_definition" comes from the standard and does not suffixed by
    "clause". If you look into the window clause definition in the
    standard, you will see:
    
    <window clause> ::=
    WINDOW <window definition list>
    <window definition list> ::=
    <window definition> [ { <comma> <window definition> }... ]
    
    So the usage of terms in our document matches the standard. Likewise,
    we see the definition of row pattern common syntax in the stabdard:
    
    <window clause> ::=
    WINDOW <window definition list>
    <window definition list> ::=
    <window definition> [ { <comma> <window definition> }... ]
    <window definition> ::=
    <new window name> AS <window specification>
    <new window name> ::=
    <window name>
    <window specification> ::=
    <left paren> <window specification details> <right paren>
    <window specification details> ::=
    [ <existing window name> ]
    [ <window partition clause> ]
    [ <window order clause> ]
    [ <window frame clause> ]
    :
    :
    <window frame clause> ::=
    [ <row pattern measures> ]
    <window frame units> <window frame extent>
    [ <window frame exclusion> ]
    [ <row pattern common syntax> ]
    
    So I think "row pattern common syntax" is perfectly appropriate
    name. If we change it to “row_pattern_recognition_clause”, it will
    just bring confusion.  In the standard “row pattern recognition
    clause” is defined RPR in FROM clause.
    
    SELECT ... FROM table MATCH RECOGNIZE (....);
    
    Here MATCH RECOGNIZE (....) is the “row pattern recognition clause”.
    
    > 16 - 0006 - select.sgml
    > ```
    > + <synopsis>
    > + [ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ]
    > ``
    > 
    > I think the two values are mutually exclusive, so curly braces should added surround them:
    > 
    > [ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]
    > 
    > [] means optional, and {} means choose one from all alternatives.
    
    Agreed. Will fix.
    
    > 17 - 0006 - select.sgml
    > ```
    > PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...]
    > ```
    > 
    > PATTERN definition should be placed inside (), here you missed ()
    
    Good catch. Will fix.
    
    > 18 - 0006 - select.sgml
    > The same code as 17, <replaceable class="parameter">pattern_variable_name</replaceable>[+], do you only put “+” here, I think SQL allows a lot of pattern quantifiers. From 0001, gram.y, looks like +, * and ? Are supported in this patch. So, maybe we can do:
    > 
    > ```
    > PATTERN ( pattern_element, [pattern_element …] )
    > 
    > and pattern_element = variable name optionally followed by quantifier (reference to somewhere introducing pattern quantifier).
    > ```
    
    Currently only *, + and ? are supported and I don't think it's worth
    to invent "pattern_element".  (Actually the standard defines much more
    complex syntax for PATTERN. I think we can revisit this once the basic
    support for quantifier *,+ and ? are brought in the core PostgreSQL
    code.
    
    > 19 - 0006 - select.sgml
    > 
    > I don’t see INITIAL and SEEK are described. Even if SEEK is not supported currently, it’s worthy mentioning that.
    
    Agreed. Will fix.
    
    > 20 - 0006 - select.sgml
    > ```
    > +    after a match found. With <literal>AFTER MATCH SKIP PAST LAST
    > +    ROW</literal> (the default) next row position is next to the last row of
    > ```
    > 
    > 21 - 0006 - select.sgml
    > ```
    >  [ <replaceable class="parameter">frame_clause</replaceable> ]
    > +[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
    > ```
    > 
    > Looks like row_pattern_common_syntax belongs to frame_clause, and you have lately added row_pattern_common_syntax to frame_clause:
    > ```
    >  <synopsis>
    > -{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
    > -{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
    > +{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
    > +{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
    >  </synopsis>
    > ```
    > 
    > So I guess row_pattern_common_syntax after frame_clause should not be added.
    
    Yes, row_pattern_common_syntax belongs to frame_clause. Will fix.
    
    > 22 - 0006 - advance.sgml
    > ```
    > <programlisting>
    > DEFINE
    >  LOWPRICE AS price &lt;= 100,
    >  UP AS price &gt; PREV(price),
    >  DOWN AS price &lt; PREV(price)
    > </programlisting>
    > 
    >     Note that <function>PREV</function> returns the price column in the
    > ```
    > 
    > In the “Note” line, “price” refers to a column from above example, so I think it should be tagged by <literal>. This comment applies to multiple places.
    
    You are right. Will fix.
    
    > I will proceed 0007 tomorrow.
    
    Thanks!
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  119. Re: Row pattern recognition

    Chao Li <li.evan.chao@gmail.com> — 2025-11-25T06:16:10Z

    
    > On Nov 24, 2025, at 08:59, Chao Li <li.evan.chao@gmail.com> wrote:
    > 
    > I will proceed 0007 tomorrow.
    
    I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL feature, from changing the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot from it. Thanks!
    
    23 - 0007
    
    As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.
    
    24 - 0007
    
    I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).
    
    25 - 0007
    ```
    +-- basic test with none greedy pattern
    ```
    
    Typo: “none greedy” -> non-greedy
    
    No comment for 0008.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  120. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-27T01:16:42Z

    > I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL feature, from changing the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot from it. Thanks!
    
    You are welcome!
    
    > 23 - 0007
    > 
    > As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.
    
    Good catch. I will add the test case. In the mean time I found a bug
    with gram.y part which handles '?' quantifier.  gram.y can be
    successfully compiled but there's no way to put '?' quantifier in a
    SQL statement.  We cannot write directly "ColId '?'" like other
    quantifier '*' and '+' in the grammer because '?' is not a "self"
    token.  So we let '?' matches Op first then check if it's '?'  or
    not. 
    
    			| ColId Op
    				{
    					if (strcmp("?", $2))
    						ereport(ERROR,
    								errcode(ERRCODE_SYNTAX_ERROR),
    								errmsg("unsupported quantifier"),
    								parser_errposition(@2));
    
    					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1);
    				}
    
    Another bug was with executor (nodeWindowAgg.c). The code to check
    greedy quantifiers missed the case '?'.
    
    > 24 - 0007
    > 
    > I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).
    
    I will add such test cases.
    
    > 25 - 0007
    > ```
    > +-- basic test with none greedy pattern
    > ```
    > 
    > Typo: “none greedy” -> non-greedy
    
    Will fix.
    
    > No comment for 0008.
    
    Thanks again for the review. I will post v35 patch set soon.  Attached
    is the diff from v34.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  121. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-11-27T03:10:08Z

    Hi Chao,
    
    Any comment on this?
    
    >> 13 - 0005 - nodeWindowAgg.c
    >> ```
    >> static
    >> int
    >> do_pattern_match(char *pattern, char *encoded_str, int len)
    >> {
    >> static regex_t *regcache = NULL;
    >> static regex_t preg;
    >> static char patbuf[1024]; /* most recent 'pattern' is cached here */
    >> ```
    >> 
    >> Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
    >
    >Yeah, such a usage maybe rare. But I hesitate to store the data
    >(regex_t) in WindowAggState because:
    >
    >(1) it bloats WindowAggState which we want to avoid if possible
    >(2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)
    >
    >> 14 - 0005 - nodeWindowAgg.c
    >> ```
    >> +		MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
    >> +
    >> +		if (regcache != NULL)
    >> +			pg_regfree(regcache);	/* free previous re */
    >> +
    >> +		/* we need to convert to char to pg_wchar */
    >> +		plen = strlen(pattern);
    >> +		data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
    >> +		data_len = pg_mb2wchar_with_len(pattern, data, plen);
    >> +		/* compile re */
    >> +		sts = pg_regcomp(&preg, /* compiled re */
    >> +						 data,	/* target pattern */
    >> +						 data_len,	/* length of pattern */
    >> +						 cflags,	/* compile option */
    >> +						 C_COLLATION_OID	/* collation */
    >> +			);
    >> +		pfree(data);
    >> +
    >> +		MemoryContextSwitchTo(oldContext);
    >> ```
    >> 
    >> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
    >> ```
    >> /*
    >> * pg_regcomp - compile regular expression
    >> *
    >> * Note: on failure, no resources remain allocated, so pg_regfree()
    >> * need not be applied to re.
    >> */
    >> int
    >> pg_regcomp(regex_t *re,
    >> const chr *string,
    >> size_t len,
    >> int flags,
    >> Oid collation)
    >> ```
    >> 
    >> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
    >
    >I admit current patch leaves the memory unfreed even after a query
    >finishes. What about adding something like:
    >
    >static void do_pattern_match_end(void)
    >{
    >	if (regcache != NULL)
    >		pg_regfree(regcache);
    >}
    >
    >and let ExecEndWindowAgg() call it?
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  122. Re: Row pattern recognition

    Chao Li <li.evan.chao@gmail.com> — 2025-11-27T06:56:02Z

    
    > On Nov 27, 2025, at 11:10, Tatsuo Ishii <ishii@postgresql.org> wrote:
    > 
    > Hi Chao,
    > 
    > Any comment on this?
    > 
    >>> 13 - 0005 - nodeWindowAgg.c
    >>> ```
    >>> static
    >>> int
    >>> do_pattern_match(char *pattern, char *encoded_str, int len)
    >>> {
    >>> static regex_t *regcache = NULL;
    >>> static regex_t preg;
    >>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
    >>> ```
    >>> 
    >>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
    >> 
    >> Yeah, such a usage maybe rare. But I hesitate to store the data
    >> (regex_t) in WindowAggState because:
    >> 
    >> (1) it bloats WindowAggState which we want to avoid if possible
    >> (2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)
    
    With the static function-scope variables, those data persist across queries, which is error prone. I am afraid that even if I let this pass, other reviewers might have the same concern.
    
    Maybe define a sub structure, say WindowAggCache, and put a pointer of WindowAggCache in WindowAggState, and only allocate memory when a query needs to.
    
    >> 
    >>> 14 - 0005 - nodeWindowAgg.c
    >>> ```
    >>> + MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
    >>> +
    >>> + if (regcache != NULL)
    >>> + pg_regfree(regcache); /* free previous re */
    >>> +
    >>> + /* we need to convert to char to pg_wchar */
    >>> + plen = strlen(pattern);
    >>> + data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
    >>> + data_len = pg_mb2wchar_with_len(pattern, data, plen);
    >>> + /* compile re */
    >>> + sts = pg_regcomp(&preg, /* compiled re */
    >>> + data, /* target pattern */
    >>> + data_len, /* length of pattern */
    >>> + cflags, /* compile option */
    >>> + C_COLLATION_OID /* collation */
    >>> + );
    >>> + pfree(data);
    >>> +
    >>> + MemoryContextSwitchTo(oldContext);
    >>> ```
    >>> 
    >>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
    >>> ```
    >>> /*
    >>> * pg_regcomp - compile regular expression
    >>> *
    >>> * Note: on failure, no resources remain allocated, so pg_regfree()
    >>> * need not be applied to re.
    >>> */
    >>> int
    >>> pg_regcomp(regex_t *re,
    >>> const chr *string,
    >>> size_t len,
    >>> int flags,
    >>> Oid collation)
    >>> ```
    >>> 
    >>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
    >> 
    >> I admit current patch leaves the memory unfreed even after a query
    >> finishes. What about adding something like:
    >> 
    >> static void do_pattern_match_end(void)
    >> {
    >> if (regcache != NULL)
    >> pg_regfree(regcache);
    >> }
    >> 
    >> and let ExecEndWindowAgg() call it?
    > 
    
    I’m not sure. But I think if we move the data into WindowAggState, then I guess we don’t have to use TopmemoryContext here.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  123. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-12-01T06:57:02Z

    Hi Chao,
    
    Sorry, I missed this email.
    
    >> 9 - 0002 - parse_clause.c
    > 
    > I am continuing to review 0003
    > 
    > 10 - 0003
    > ```
    > +	Assert(list_length(patternVariables) == list_length(defineClause));
    > ```
    > 
    > Is this assert true? From what I learned, pattern length doesn’t have to equal to define length. For example, I got an example from [1]:
    
    You are right. If PATTERN clause uses the same pattern variable more
    than once (which is perfectly valid), the assertion fails. I will
    remove the assersion and fix the subsequent forboth loop.
    
    > ```
    > Example 4
    > -- This example has three different patterns.
    > -- Comment two of them, to get error-free query.
    > SELECT company, price_date, price
    >   FROM stock_price
    >     MATCH_RECOGNIZE (
    >        partition by company
    >        order by price_date
    >        all rows per match
    >        pattern ( limit_50  up   up* down  down*  )
    >        define
    >            limit_50 as price <= 50.00,
    >            up   as price > prev(price),
    >            down as price < prev(price)
    >     )
    >    WHERE company = 'B'
    >    ORDER BY price_date;
    > ```
    > 
    > In this example, pattern has 5 elements and define has only 3 elements.
    
    Yes.
    
    > 11 - 0004 - plannodes.h
    > ```
    > +	/* Row Pattern Recognition AFTER MACH SKIP clause */
    > +	RPSkipTo	rpSkipTo;		/* Row Pattern Skip To type */
    > ```
    > 
    > Typo: MACH -> MATCH
    
    Will fix/
    
    > I’d stop here, and continue 0005 tomorrow.
    
    Thanks!
    
    > [1] https://link.springer.com/article/10.1007/s13222-022-00404-3
    > 
    > Best regards,
    > --
    > Chao Li (Evan)
    > HighGo Software Co., Ltd.
    > https://www.highgo.com/
    > 
    > 
    > 
    > 
    
  124. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-12-01T12:21:38Z

    >> On Nov 27, 2025, at 11:10, Tatsuo Ishii <ishii@postgresql.org> wrote:
    >> 
    >> Hi Chao,
    >> 
    >> Any comment on this?
    >> 
    >>>> 13 - 0005 - nodeWindowAgg.c
    >>>> ```
    >>>> static
    >>>> int
    >>>> do_pattern_match(char *pattern, char *encoded_str, int len)
    >>>> {
    >>>> static regex_t *regcache = NULL;
    >>>> static regex_t preg;
    >>>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
    >>>> ```
    >>>> 
    >>>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
    >>> 
    >>> Yeah, such a usage maybe rare. But I hesitate to store the data
    >>> (regex_t) in WindowAggState because:
    >>> 
    >>> (1) it bloats WindowAggState which we want to avoid if possible
    >>> (2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)
    > 
    > With the static function-scope variables, those data persist across queries, which is error prone. I am afraid that even if I let this pass, other reviewers might have the same concern.
    > 
    > Maybe define a sub structure, say WindowAggCache, and put a pointer of WindowAggCache in WindowAggState, and only allocate memory when a query needs to.
    > 
    >>> 
    >>>> 14 - 0005 - nodeWindowAgg.c
    >>>> ```
    >>>> + MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
    >>>> +
    >>>> + if (regcache != NULL)
    >>>> + pg_regfree(regcache); /* free previous re */
    >>>> +
    >>>> + /* we need to convert to char to pg_wchar */
    >>>> + plen = strlen(pattern);
    >>>> + data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
    >>>> + data_len = pg_mb2wchar_with_len(pattern, data, plen);
    >>>> + /* compile re */
    >>>> + sts = pg_regcomp(&preg, /* compiled re */
    >>>> + data, /* target pattern */
    >>>> + data_len, /* length of pattern */
    >>>> + cflags, /* compile option */
    >>>> + C_COLLATION_OID /* collation */
    >>>> + );
    >>>> + pfree(data);
    >>>> +
    >>>> + MemoryContextSwitchTo(oldContext);
    >>>> ```
    >>>> 
    >>>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
    >>>> ```
    >>>> /*
    >>>> * pg_regcomp - compile regular expression
    >>>> *
    >>>> * Note: on failure, no resources remain allocated, so pg_regfree()
    >>>> * need not be applied to re.
    >>>> */
    >>>> int
    >>>> pg_regcomp(regex_t *re,
    >>>> const chr *string,
    >>>> size_t len,
    >>>> int flags,
    >>>> Oid collation)
    >>>> ```
    >>>> 
    >>>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
    >>> 
    >>> I admit current patch leaves the memory unfreed even after a query
    >>> finishes. What about adding something like:
    >>> 
    >>> static void do_pattern_match_end(void)
    >>> {
    >>> if (regcache != NULL)
    >>> pg_regfree(regcache);
    >>> }
    >>> 
    >>> and let ExecEndWindowAgg() call it?
    >> 
    > 
    > I’m not sure. But I think if we move the data into WindowAggState, then I guess we don’t have to use TopmemoryContext here.
    
    I decided to add new fields to WindowAggState:
    
    	/* regular expression compiled result cache. Used for RPR. */
    	char	   *patbuf;			/* pattern to be compiled */
    	regex_t		preg;			/* compiled re pattern */
    
    Then allocate the memory for them using
    winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory;
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
    
    
    
  125. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-12-01T12:42:18Z

    >> I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL feature, from changing the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot from it. Thanks!
    > 
    > You are welcome!
    > 
    >> 23 - 0007
    >> 
    >> As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.
    > 
    > Good catch. I will add the test case. In the mean time I found a bug
    > with gram.y part which handles '?' quantifier.  gram.y can be
    > successfully compiled but there's no way to put '?' quantifier in a
    > SQL statement.  We cannot write directly "ColId '?'" like other
    > quantifier '*' and '+' in the grammer because '?' is not a "self"
    > token.  So we let '?' matches Op first then check if it's '?'  or
    > not. 
    > 
    > 			| ColId Op
    > 				{
    > 					if (strcmp("?", $2))
    > 						ereport(ERROR,
    > 								errcode(ERRCODE_SYNTAX_ERROR),
    > 								errmsg("unsupported quantifier"),
    > 								parser_errposition(@2));
    > 
    > 					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1);
    > 				}
    > 
    > Another bug was with executor (nodeWindowAgg.c). The code to check
    > greedy quantifiers missed the case '?'.
    > 
    >> 24 - 0007
    >> 
    >> I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).
    > 
    > I will add such test cases.
    > 
    >> 25 - 0007
    >> ```
    >> +-- basic test with none greedy pattern
    >> ```
    >> 
    >> Typo: “none greedy” -> non-greedy
    > 
    > Will fix.
    > 
    >> No comment for 0008.
    > 
    > Thanks again for the review. I will post v35 patch set soon.  Attached
    > is the diff from v34.
    
    Attached are the v35 patches for Row pattern recognition.  Chao Li
    reviewed v34 thoroughly. Thank you! v35 reflects the review
    comments. Major differences from v34 include:
    
    - Make "DEFINE" an unreserved keyword. Previously it was a reserved keyword.
    - Refactor transformDefineClause() to make two foreach loops into single foreach loop.
    - Make '?' quantifier in PATTERN work as advertised. Test for '?' quantifier is added too.
    - Unsupported quantifier test added.
    - Fix get_rule_define().
    - Fix memory leak related to regcomp.
    - Move regcomp compiled result cache from static data to WindowAggState.
    - Fix several typos.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  126. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-12-23T13:26:54Z

    Attached are the v36 patches, just for rebasing.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp
    
  127. Re: Row pattern recognition

    Henson Choi <assam258@gmail.com> — 2025-12-31T12:10:53Z

    SQL/RPR Patch Review Report
    ============================
    
    Patch: Row Pattern Recognition (SQL:2016)
    Commitfest: https://commitfest.postgresql.org/patch/4460
    
    Review Methodology:
      This review focused on quality assessment, not line-by-line code audit.
      Key code paths and quality issues were examined with surrounding context
      when concerns arose. Documentation files were reviewed with AI-assisted
      grammar and typo checking. Code coverage was measured using gcov and
      custom analysis tools.
    
    Limitations:
      - Not a security audit or formal verification
      - Parser and executor internals reviewed at module level, not exhaustively
      - Performance characteristics not benchmarked
    
    
    TABLE OF CONTENTS
    -----------------
    
    1. Executive Summary
    2. Issues Found
       2.1 Critical / Major
       2.2 Minor Issues (Review Needed)
       2.3 Minor Issues (Style)
       2.4 Suggestions for Discussion
    3. Test Coverage
       3.1 Covered Areas
       3.2 Untested Items
       3.3 Unimplemented Features (No Test Needed)
    4. Code Coverage Analysis
       4.1 Overall Coverage
       4.2 Coverage by File
       4.3 Uncovered Code Risk Assessment
    5. Commit Analysis
    6. Recommendations
    7. Conclusion
    
    
    1. EXECUTIVE SUMMARY
    --------------------
    
    Overall assessment: GOOD
    
    The SQL/RPR patch demonstrates solid implementation quality within its
    defined scope. Code follows PostgreSQL coding standards (with minor style
    issues), test coverage is comprehensive at 96.4%, and documentation is
    thorough with only minor typos.
    
    Rating by Area:
    - Code Quality:     Good (PostgreSQL style compliant, 26 static style fixes
    recommended)
    - Test Coverage:    Good (96.4% line coverage, 28 test cases)
    - Documentation:    Good (Complete, 1 minor typo)
    - Build/Regress:    Pass (make check-world, 753 tests passed)
    
    
    2. ISSUES FOUND
    ---------------
    
    2.1 Critical / Major
    
    #1 [Code] Greedy pattern combinatorial explosion
       Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS TRUE
       Impact: Memory exhaustion or infinite wait
       Recommendation: Add work_mem-based memory limit (error on exceed)
    
       Evidence - No memory limit in current code:
       - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional
       - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2; repalloc()
    unlimited
       - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit
    
       Only work_mem usage in RPR (nodeWindowAgg.c:1297):
         winstate->buffer = tuplestore_begin_heap(false, false, work_mem);
       -> For tuple buffer, unrelated to pattern combination memory (StringSet)
    
    2.2 Minor Issues (Review Needed)
    
    #1 [Code] nodeWindowAgg.c:5849,5909,5912
       pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN
    elements
    
       Reproduction:
       - PATTERN (A B C ... Z A) (27 elements) -> OK
       - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not valid
    char: b"
    
    2.3 Minor Issues (Style)
    
    #1 [Code] nodeWindowAgg.c (25 blocks)
       #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove
    
    #2 [Code] src/backend/executor/nodeWindowAgg.c
       static keyword on separate line (26 functions)
    
    #3 [Code] src/backend/utils/adt/ruleutils.c
       Whitespace typo: "regexp !=NULL" -> "regexp != NULL"
    
    #4 [Code] nodeWindowAgg.c:4619
       Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL style)
    
    #5 [Doc] doc/src/sgml/ref/select.sgml:1128
       Typo: "maximu" -> "maximum"
    
    2.4 Suggestions for Discussion
    
    #1 Incremental matching with streaming NFA redesign
       Benefits:
       - O(n) time complexity per row (current: exponential in worst case)
       - Bounded memory via state merging and context absorption
       - Natural extension for OR patterns, {n,m} quantifiers, nested groups
       - Enables MEASURES clause with incremental aggregates
       - Proven approach in CEP engines (Flink, Esper)
    
    
    3. TEST COVERAGE
    ----------------
    
    3.1 Covered Areas
    
    - PATTERN clause: +, *, ? quantifiers (line 41, 71, 86)
    - DEFINE clause: Variable conditions, auto-TRUE for missing (line 120-133)
    - PREV/NEXT functions: Single argument (line 44, 173)
    - AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198)
    - Aggregate integration: sum, avg, count, max, min (line 258-277)
    - Window function integration: first_value, last_value, nth_value (line
    34-35)
    - JOIN/CTE: JOIN (line 313), WITH (line 324)
    - View: VIEW creation, pg_get_viewdef (line 390-406)
    - Error cases: 7 error scenarios (line 409-532)
    - Large partition: 5000 row smoke test (line 360-387)
    - ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244)
    
    3.2 Untested Items
    
    Feature                  Priority   Recommendation
    -------------------------------------------------------------------------------
    26 variable limit        Medium     Test 26 variables success, 27th
    variable error
    NULL value handling      Low        Test PREV(col) where col or previous
    row is NULL
    
    Sample test for 26 variable limit:
    
        -- Should fail with 27th variable (parser error, no table needed)
        SELECT * FROM (SELECT 1 AS x) t
        WINDOW w AS (
          ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
          PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
          DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS
    TRUE,
                 g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS
    TRUE,
                 m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS
    TRUE,
                 s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS
    TRUE,
                 y AS TRUE, z AS TRUE, aa AS TRUE
        );
        -- ERROR: number of row pattern definition variable names exceeds 26
    
    Sample test for NULL handling:
    
        CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price INTEGER);
        INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100);
        INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL);  -- NULL in
    middle
        INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200);
        INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150);
    
        SELECT company, tdate, price, count(*) OVER w AS match_count
        FROM stock_null
        WINDOW w AS (
          PARTITION BY company
          ORDER BY tdate
          ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
          PATTERN (START UP DOWN)
          DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price <
    PREV(price)
        );
        -- Result: all rows show match_count = 0 (NULL breaks pattern matching)
    
    3.3 Unimplemented Features (No Test Needed)
    
    Per documentation (select.sgml:1133-1136):
    - MEASURES clause: Not implemented
    - SUBSET clause: Not implemented
    - AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported
    
    Not in documentation, verified by testing:
    - {n,m} quantifier: Parser error ("syntax error at or near {")
    - (A|B) OR pattern: Not supported (parsed as single variable name "a|b")
    - (A B)+ compound repetition: Parser accepts, but execution returns 0
    matches
    
    
    4. CODE COVERAGE ANALYSIS
    -------------------------
    
    4.1 Overall Coverage
    
    96.4% (732 / 759 lines)
    
    4.2 Coverage by File (major RPR-modified files)
    
    nodeWindowAgg.c:    96.6% (560/580) - Pattern matching execution engine
    parse_clause.c:     96.7% (88/91) - PATTERN/DEFINE analysis
    ruleutils.c:        93.1% (54/58) - pg_get_viewdef output
    
    4.3 Uncovered Code Risk Assessment
    
    Total: 27 lines uncovered
    
    Medium Risk (2 items) - Test addition recommended (see section 3.2):
    - parse_clause.c:4093: transformDefineClause - 27th variable error
    - nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first
    row
    
    Low Risk (25 items) - Defensive code / Unreachable:
    - nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count
    - nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default
    - nodeWindowAgg.c:3566: window_gettupleslot - Before mark position
    - nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag
    - nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check
    - nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default
    - nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check
    - nodeWindowAgg.c:5007: search_str_set - NULL return continue
    - nodeWindowAgg.c:5405: do_pattern_match - Regex compile error
    - nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error
    - nodeWindowAgg.c:5700: pattern_initial - Variable not found
    - nodeWindowAgg.c:5776: string_set_get - Index range check
    - nodeWindowAgg.c:5850: variable_pos_register - a-z range check
    - nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check
    - nodeWindowAgg.c:5913: variable_pos_fetch - Index range check
    - parse_clause.c:3989: transformDefineClause - A_Expr type check
    - parse_clause.c:4145: transformPatternClause - A_Expr type check
    - ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output
    
    Conclusion: Most uncovered code consists of defensive error handling or
    code unreachable due to parser pre-validation. No security or functional
    risk.
    
    
    5. COMMIT ANALYSIS
    ------------------
    
    8 sequential commits:
    
    Commit  Area            Files   +/-         Key Content
    -------------------------------------------------------------------------------
    1       Raw Parser      4       +174/-16    gram.y grammar (PATTERN/DEFINE)
    2       Parse/Analysis  4       +277/-1     parse_clause.c analysis logic
    3       Rewriter        1       +109/-0     pg_get_viewdef extension
    4       Planner         5       +73/-3      WindowAgg plan node extension
    5       Executor        4       +1,942/-11  CORE: Pattern matching engine
    (+1,850)
    6       Docs            3       +192/-7     advanced.sgml, func-window.sgml
    7       Tests           3       +1,585/-1   rpr.sql (532), rpr.out (1,052)
    8       typedefs        1       +6/-0       pgindent support
    
    Code Change Statistics:
    - Total files: 25
    - Lines added: 4,358
    - Lines deleted: 39
    - Net increase: +4,319 lines
    
    
    6. RECOMMENDATIONS
    ------------------
    
    6.1 Combinatorial Explosion Prevention (Major, Required)
    
    Add work_mem-based memory limit for StringSet allocation.
    Location: string_set_add() in nodeWindowAgg.c:5741-5750
    Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.)
    
    6.2 Code Review Required (Minor, Decision Needed)
    
    Location: nodeWindowAgg.c:5849,5909,5912
    Issue: pos > NUM_ALPHABETS check intent unclear
    Current: PATTERN with 28 elements causes error
    Question: Is 27 element limit intentional?
    
    6.3 Code Style Fixes (Minor)
    
    - #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks)
    - static keyword on separate line: 26 functions to fix
    - Whitespace: "regexp !=NULL" -> "regexp != NULL"
    - Error message case: "Unrecognized" -> "unrecognized"
    
    6.4 Documentation Fixes (Minor)
    
    - select.sgml: "maximu" -> "maximum"
    
    6.5 Test Additions (Optional)
    
    Black-box Tests (Functional):
    
    Feature              Test Case                                    Priority
    -------------------------------------------------------------------------------
    Variable limit       26 variables success, 27 error               Medium
    NULL boundary        PREV at partition first row                  Medium
    
    White-box Tests (Coverage) - covered by above black-box tests:
    
    File:Line                          Target Branch
    -------------------------------------------------------------------------------
    parse_clause.c:4093                Limit error branch (Variable limit test)
    nodeWindowAgg.c:5623               null_slot usage (NULL boundary test)
    
    
    7. CONCLUSION
    -------------
    
    Test Quality: GOOD
    
    Core functionality is thoroughly tested with comprehensive error case
    coverage.
    
    The patch is well-implemented within its defined scope. Identified issues
    include
    one major concern (combinatorial explosion) and minor style/documentation
    items.
    
    Recommended actions before commit:
    1. Add work_mem-based memory limit for pattern combinations (required)
    2. Clarify pos > NUM_ALPHABETS check intent (review needed)
    3. Fix code style issues (optional but recommended)
    4. Fix documentation typo (trivial)
    5. Add tests for variable limit and NULL handling (optional)
    
    Points for discussion (optional):
    6. Incremental matching with streaming NFA redesign
    
    
    Attachment:
    - coverage.tgz (gcov HTML report, RPR-modified code only)
    
    
    ---
    End of Report
    
    2025년 9월 24일 (수) PM 7:36, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
    
    > Attached are the v33 patches for Row pattern recognition.  The
    > difference from v32 is that the raw parse tree printing patch is not
    > included in v33. This is because now that we have
    > debug_print_raw_parse.
    >
    > Best regards,
    > --
    > Tatsuo Ishii
    > SRA OSS K.K.
    > English: http://www.sraoss.co.jp/index_en/
    > Japanese:http://www.sraoss.co.jp
    >