Re: Row pattern recognition
Tatsuo Ishii <ishii@postgresql.org>
From: Tatsuo Ishii <ishii@postgresql.org>
To: david.g.johnston@gmail.com, vik@postgresfriends.org,
jacob.champion@enterprisedb.com, er@xs4all.nl, peter@eisentraut.org
Cc: pgsql-hackers@postgresql.org
Date: 2024-12-30T13:37:18Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add temporal FOREIGN KEY contraints
- 89f908a6d0ac 18.0 cited
-
Remove obsolete executor cleanup code
- d060e921ea5a 17.0 cited
Attachments
- v26-0001-Row-pattern-recognition-patch-for-raw-parser.patch (text/x-patch)
I have added further optimization to the v25 patch. While generating possible input strings that may satisfy the pattern string, it is possible to omit to run regexp in some cases. Since regexp matching is heavy operation, especially if it is applied to long string, it is beneficial for RPR to reduce the number of regexp runs. If the tail pattern variable has '+' quantifier and previously the input string was confirmed to be matched the pattern string, and the same character as the tail pattern string is added, we don't need run regexp match again because the new input string surely matches the pattern string. Suppose a pattern string is "ab+" and the current input string is "ab" (this satisfies "ab+"). If the new input string is "abb", then "abb" surely matches "ab+" too and we don't need to run regexp again. In v26 patch, by using the technique above I get performance improvement. >> EXPLAIN (ANALYZE) >> SELECT aid, bid, count(*) OVER w >> FROM pgbench_accounts WHERE aid <= 10000 >> WINDOW w AS ( >> PARTITION BY bid >> ORDER BY aid >> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING >> AFTER MATCH SKIP PAST LAST ROW >> INITIAL >> PATTERN (START UP+) >> DEFINE >> START AS TRUE, >> UP AS aid > PREV(aid) >> ); This SQL took 322.5913 ms (average in 3 runs) in v24. With v26 patch, it takes 41.84 ms, which is over 7 times improvement. Also I run the SQL in 100k row case. v23 took 26 seconds. With the v26 patch it takes 1195.603 ms, which is over 21 times improvement. I think a pattern string ended up with '+' is one of common use cases of RPR, and I believe the improvement is useful for many RPR applications. I also add following changes to v25. - Fix do_pattern_match to use the top memory context to store compiled re cache. Before it was in per query memory context. This causes a trouble because do_pattern_match checks the cache existence using a static variable. - Refactor search_str_set, which is a workhorse of pattern matching, into multiple functions to understand the logic easier. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp