Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org>

From: Tatsuo Ishii <ishii@postgresql.org>

To: david.g.johnston@gmail.com, vik@postgresfriends.org, jacob.champion@enterprisedb.com, er@xs4all.nl, peter@eisentraut.org

Cc: pgsql-hackers@postgresql.org

Date: 2024-12-30T23:57:07Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add temporal FOREIGN KEY contraints
- 89f908a6d0ac 18.0 cited
Remove obsolete executor cleanup code
- d060e921ea5a 17.0 cited

Attachments

v27-0001-Row-pattern-recognition-patch-for-raw-parser.patch (text/x-patch)

> I have added further optimization to the v25 patch.
> 
> While generating possible input strings that may satisfy the pattern
> string, it is possible to omit to run regexp in some cases. Since
> regexp matching is heavy operation, especially if it is applied to
> long string, it is beneficial for RPR to reduce the number of regexp
> runs.
> 
> If the tail pattern variable has '+' quantifier and previously the
> input string was confirmed to be matched the pattern string, and the
> same character as the tail pattern string is added, we don't need run
> regexp match again because the new input string surely matches the
> pattern string. Suppose a pattern string is "ab+" and the current
> input string is "ab" (this satisfies "ab+"). If the new input string
> is "abb", then "abb" surely matches "ab+" too and we don't need to run
> regexp again.
> 
> In v26 patch, by using the technique above I get performance
> improvement.
> 
>>> EXPLAIN (ANALYZE)
>>> SELECT aid, bid, count(*) OVER w
>>> FROM pgbench_accounts WHERE aid <= 10000
>>> WINDOW w AS (
>>> PARTITION BY bid
>>> ORDER BY aid
>>> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>> AFTER MATCH SKIP PAST LAST ROW
>>> INITIAL
>>> PATTERN (START UP+)
>>> DEFINE
>>> START AS TRUE,
>>> UP AS aid > PREV(aid)
>>> );
> 
> This SQL took 322.5913 ms (average in 3 runs) in v24. With v26 patch,
> it takes 41.84 ms, which is over 7 times improvement. Also I run the
> SQL in 100k row case. v23 took 26 seconds. With the v26 patch it takes
> 1195.603 ms, which is over 21 times improvement.
> 
> I think a pattern string ended up with '+' is one of common use cases
> of RPR, and I believe the improvement is useful for many RPR
> applications.
> 
> I also add following changes to v25.
> 
> - Fix do_pattern_match to use the top memory context to store compiled
>   re cache. Before it was in per query memory context. This causes a
>   trouble because do_pattern_match checks the cache existence using
>   a static variable.
> 
> - Refactor search_str_set, which is a workhorse of pattern matching,
>   into multiple functions to understand the logic easier.

CFBot complains a compiler error in the v26 patch.
Attached is v27 patch to fix this. Also some typo in comment are fixed.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp