Re: Row pattern recognition
Tatsuo Ishii <ishii@postgresql.org>
From: Tatsuo Ishii <ishii@postgresql.org>
To: david.g.johnston@gmail.com, vik@postgresfriends.org,
jacob.champion@enterprisedb.com, er@xs4all.nl, peter@eisentraut.org
Cc: pgsql-hackers@postgresql.org
Date: 2024-12-30T23:57:07Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add temporal FOREIGN KEY contraints
- 89f908a6d0ac 18.0 cited
-
Remove obsolete executor cleanup code
- d060e921ea5a 17.0 cited
Attachments
- v27-0001-Row-pattern-recognition-patch-for-raw-parser.patch (text/x-patch)
> I have added further optimization to the v25 patch. > > While generating possible input strings that may satisfy the pattern > string, it is possible to omit to run regexp in some cases. Since > regexp matching is heavy operation, especially if it is applied to > long string, it is beneficial for RPR to reduce the number of regexp > runs. > > If the tail pattern variable has '+' quantifier and previously the > input string was confirmed to be matched the pattern string, and the > same character as the tail pattern string is added, we don't need run > regexp match again because the new input string surely matches the > pattern string. Suppose a pattern string is "ab+" and the current > input string is "ab" (this satisfies "ab+"). If the new input string > is "abb", then "abb" surely matches "ab+" too and we don't need to run > regexp again. > > In v26 patch, by using the technique above I get performance > improvement. > >>> EXPLAIN (ANALYZE) >>> SELECT aid, bid, count(*) OVER w >>> FROM pgbench_accounts WHERE aid <= 10000 >>> WINDOW w AS ( >>> PARTITION BY bid >>> ORDER BY aid >>> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING >>> AFTER MATCH SKIP PAST LAST ROW >>> INITIAL >>> PATTERN (START UP+) >>> DEFINE >>> START AS TRUE, >>> UP AS aid > PREV(aid) >>> ); > > This SQL took 322.5913 ms (average in 3 runs) in v24. With v26 patch, > it takes 41.84 ms, which is over 7 times improvement. Also I run the > SQL in 100k row case. v23 took 26 seconds. With the v26 patch it takes > 1195.603 ms, which is over 21 times improvement. > > I think a pattern string ended up with '+' is one of common use cases > of RPR, and I believe the improvement is useful for many RPR > applications. > > I also add following changes to v25. > > - Fix do_pattern_match to use the top memory context to store compiled > re cache. Before it was in per query memory context. This causes a > trouble because do_pattern_match checks the cache existence using > a static variable. > > - Refactor search_str_set, which is a workhorse of pattern matching, > into multiple functions to understand the logic easier. CFBot complains a compiler error in the v26 patch. Attached is v27 patch to fix this. Also some typo in comment are fixed. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp