Re: Row pattern recognition
Tatsuo Ishii <ishii@sraoss.co.jp>
From: Tatsuo Ishii <ishii@sraoss.co.jp>
To: vik@postgresfriends.org, jchampion@timescale.com
Cc: pgsql-hackers@postgresql.org
Date: 2023-07-26T12:21:34Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add temporal FOREIGN KEY contraints
- 89f908a6d0ac 18.0 cited
-
Remove obsolete executor cleanup code
- d060e921ea5a 17.0 cited
Attachments
- v3-0001-Row-pattern-recognition-patch-for-raw-parser.patch (text/x-patch)
Attached is the v3 patch. In this patch following changes are made. (1) I completely changed the pattern matching engine so that it performs backtracking. Now the engine evaluates all pattern elements defined in PATTER against each row, saving matched pattern variables in a string per row. For example if the pattern element A and B evaluated to true, a string "AB" is created for current row. This continues until all pattern matching fails or encounters the end of full window frame/partition. After that, the pattern matching engine creates all possible "pattern strings" and apply the regular expression matching to each. For example if we have row 0 = "AB" row 1 = "C", possible pattern strings are: "AC" and "BC". If it matches, the length of matching substring is saved. After all possible trials are done, the longest matching substring is chosen and it becomes the width (number of rows) in the reduced window frame. See row_is_in_reduced_frame, search_str_set and search_str_set_recurse in nodeWindowAggs.c for more details. For now I use a naive depth first search and probably there is a lot of rooms for optimization (for example rewriting it without using recursion). Suggestions/patches are welcome. Jacob Champion wrote: > It looks like the + qualifier has trouble when it meets the end of the > frame. For instance, try removing the last row of the 'stock' table in > rpr.sql; some of the final matches will disappear unexpectedly. Or try a > pattern like > > PATTERN ( a+ ) > DEFINE a AS TRUE > > which doesn't seem to match anything in my testing. > > There's also the issue of backtracking in the face of reclassification, > as I think Vik was alluding to upthread. The pattern > > PATTERN ( a+ b+ ) > DEFINE a AS col = 2, > b AS col = 2 With the new engine, cases above do not fail anymore. See new regression test cases. Thanks for providing valuable test cases! (2) Make window functions RPR aware. Now first_value, last_value, and nth_value recognize RPR (maybe first_value do not need any change?) Vik Fearing wrote: > I strongly disagree with this. Window function do not need to know > how the frame is defined, and indeed they should not. > WinGetFuncArgInFrame should answer yes or no and the window function > just works on that. Otherwise we will get extension (and possibly even > core) functions that don't handle the frame properly. So I moved row_is_in_reduce_frame into WinGetFuncArgInFrame so that those window functions are not needed to be changed. (3) Window function rpr was removed. We can use first_value instead. (4) Remaining tasks/issues. - For now I disable WinSetMarkPosition because RPR often needs to access a row before the mark is set. We need to fix this in the future. - I am working on making window aggregates RPR aware now. The implementation is in progress and far from completeness. An example is below. I think row 2, 3, 4 of "count" column should be NULL instead of 3, 2, 0, 0. Same thing can be said to other rows. Probably this is an effect of moving aggregate but I still studying the window aggregation code. SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM stock WINDOW w AS ( PARTITION BY company ORDER BY tdate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AFTER MATCH SKIP PAST LAST ROW INITIAL PATTERN (START UP+ DOWN+) DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price < PREV(price) ); company | tdate | first_value | count ----------+------------+-------------+------- company1 | 2023-07-01 | 100 | 4 company1 | 2023-07-02 | | 3 company1 | 2023-07-03 | | 2 company1 | 2023-07-04 | | 0 company1 | 2023-07-05 | | 0 company1 | 2023-07-06 | 90 | 4 company1 | 2023-07-07 | | 3 company1 | 2023-07-08 | | 2 company1 | 2023-07-09 | | 0 company1 | 2023-07-10 | | 0 company2 | 2023-07-01 | 50 | 4 company2 | 2023-07-02 | | 3 company2 | 2023-07-03 | | 2 company2 | 2023-07-04 | | 0 company2 | 2023-07-05 | | 0 company2 | 2023-07-06 | 60 | 4 company2 | 2023-07-07 | | 3 company2 | 2023-07-08 | | 2 company2 | 2023-07-09 | | 0 company2 | 2023-07-10 | | 0 - If attributes appearing in DEFINE are not used in the target list, query fails. SELECT company, tdate, count(*) OVER w FROM stock WINDOW w AS ( PARTITION BY company ORDER BY tdate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AFTER MATCH SKIP PAST LAST ROW INITIAL PATTERN (START UP+ DOWN+) DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price < PREV(price) ); ERROR: attribute number 3 exceeds number of columns 2 This is because attributes appearing in DEFINE are not added to the target list. I am looking for way to teach planner to add attributes appearing in DEFINE. I am going to add this thread to CommitFest and plan to add both of you as reviewers. Thanks in advance. -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp