Re: Row pattern recognition
Tatsuo Ishii <ishii@postgresql.org>
From: Tatsuo Ishii <ishii@postgresql.org>
To: david.g.johnston@gmail.com, vik@postgresfriends.org,
jacob.champion@enterprisedb.com, er@xs4all.nl, peter@eisentraut.org
Cc: pgsql-hackers@postgresql.org
Date: 2024-12-19T06:19:50Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add temporal FOREIGN KEY contraints
- 89f908a6d0ac 18.0 cited
-
Remove obsolete executor cleanup code
- d060e921ea5a 17.0 cited
Attachments
- v24-0001-Row-pattern-recognition-patch-for-raw-parser.patch (text/x-patch)
I have looked into the performance of current RPR implementation, especially when the number of rows in a reduced frame is large (like over 10k). Below is a simple benchmark against pgbench database. The SQL will produce a reduced frame having 10k rows. EXPLAIN (ANALYZE) SELECT aid, bid, count(*) OVER w FROM pgbench_accounts WHERE aid <= 10000 WINDOW w AS ( PARTITION BY bid ORDER BY aid ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AFTER MATCH SKIP PAST LAST ROW INITIAL PATTERN (START UP+) DEFINE START AS TRUE, UP AS aid > PREV(aid) ); This took 722 ms on my laptop. It's not very quick. Moreover, if I expand the reduced frame size to 100k (aid <= 100000), OOM killer triggered. I looked into the code and found that do_pattern_match in nodeWindowAgg.c is one of the major problems. It calls regexp_instr to know whether the regular expression derived from a PATTERN clause (e.g. "ab+c+") matches an encoded row pattern variable string (e.g. "abbcc"). The latter string could be quite long: the length could be as same as the number of rows in the reduced frame. Thus, The length could become 100k if the frame size is 100k. Unfortunately regexp_instr needs to allocate and convert the input string to wchar (it's 4-byte long for each character), which uses 4x space bigger than the original input string. In RPR case the input string is always ASCII and does not need to be converted to wchar. So I decided to switch to the standard regular expression engine coming with OS. With this change, I got 2x speed up in the 10k case. v23 patch: 722.618 ms (average of 3 runs) new patch: 322.5913 ms (average of 3 runs) Also I tried the 100k rows reduced frame case. It was slow (took 26 seconds) but it completed without OOM killer. Attached is the patch. The change was in 0005 only. Other patches were not changed from v23. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp