Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Peter Geoghegan <pg@bowt.ie>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
nbtree: Always set skipScan flag on rescan.
- 454c046094ab 19 (unreleased) landed
- bee763aea13f 18.0 landed
-
meson: Build numeric.c with -ftree-vectorize.
- 9016fa7e3bcd 19 (unreleased) cited
-
Fix "variable not found in subplan target lists" in semijoin de-duplication.
- b8a1bdc458e3 19 (unreleased) cited
-
Revert "nbtree: Remove useless row compare arg."
- dd2ce3792754 18.0 landed
-
nbtree: Remove useless row compare arg.
- 54c6ea8c81db 18.0 cited
-
Prevent premature nbtree array advancement.
- 5f4d98d4f371 18.0 landed
-
nbtree: tighten up array recheck rules.
- 7e25c9363a82 18.0 landed
-
Avoid treating nonrequired nbtree keys as required.
- 0f08df406822 18.0 landed
-
Adjust overstrong nbtree skip array assertion.
- 9d924dbb3710 18.0 landed
-
Make NULL tuple values always advance skip arrays.
- b75fedcab791 18.0 cited
-
Avoid extra index searches through preprocessing.
- b3f1a13f22f9 18.0 landed
-
Improve nbtree skip scan primitive scan scheduling.
- 21a152b37f36 18.0 landed
-
Further optimize nbtree search scan key comparisons.
- 8a510275dd6b 18.0 landed
-
Add nbtree skip scan optimization.
- 92fe23d93aa3 18.0 landed
-
Improve nbtree array primitive scan scheduling.
- 9a2e2a285a14 18.0 landed
-
nbtree: Make BTMaxItemSize into object-like macro.
- 426ea611171d 18.0 landed
-
Show index search count in EXPLAIN ANALYZE, take 2.
- 0fbceae841cb 18.0 landed
-
Make parallel nbtree index scans use an LWLock.
- 67fc4c9fd7fa 18.0 landed
-
Show index search count in EXPLAIN ANALYZE.
- 5ead85fbc811 18.0 landed
-
Avoid nbtree parallel scan currPos confusion.
- b5ee4e52026b 18.0 cited
-
nbtree: Remove useless 'strat' local variable.
- b6558e4f837e 18.0 landed
-
Normalize nbtree truncated high key array behavior.
- 79fa7b3b1a44 18.0 landed
-
Refactor handling of nbtree array redundancies.
- b524974106ac 18.0 landed
-
Fix nbtree pgstats accounting with parallel scans.
- c00c54a9ac1e 18.0 landed
- fb4f5e58af97 17.0 landed
-
Avoid parallel nbtree index scan hangs with SAOPs.
- d8adfc18bebf 18.0 landed
- a24bffc021d9 17.0 landed
-
Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE
- 5a1e6df3b84c 18.0 cited
-
Enhance nbtree ScalarArrayOp execution.
- 5bf748b86bc6 17.0 cited
-
Skip checking of scan keys required for directional scan in B-tree
- e0b1ee17dc3a 17.0 cited
-
Instead of using a numberOfRequiredKeys count to distinguish required
- 7ccaf13a06b8 8.2.0 cited
Attachments
- v18-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch (application/octet-stream) patch v18-0001
- v18-0003-Add-skipskip-nbtree-skip-scan-optimization.patch (application/octet-stream) patch v18-0003
- v18-0002-Add-skip-scan-to-nbtree.patch (application/octet-stream) patch v18-0002
On Mon, Nov 25, 2024 at 4:07 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > The approach looks good to me. In fact, the significant regressions I > reported have disappeared in my environment. And the test_for_v17.sql > shows that the performance of the master and the master with your > patches > is almost same. That's great. I'm also a bit worried about regressing queries that don't even involve skip arrays -- particularly very simple queries. I'm thinking of things like "pgbench -S" style SELECT queries. Adding almost any new code to a hot code path such as _bt_check_compare comes with a risk of such regressions. (Up until now it has been convenient to pretend that "skipscan_prefix_cols=0" is 100% representative of master, but obviously it is less than 100% representative -- since "skipscan_prefix_cols=0" still has enlarged object code size in places like _bt_check_compare.) Attached is v18, which makes _bt_check_compare copy all relevant scan key fields into local variables. This gives the compiler more freedom due to not having to worry about aliasing. I believe that this change to _bt_check_compare fixes some regressions with simple queries. This includes a "pgbench -S" variant that I previously used during the Postgres 17 SAOP array work -- a script that showed a favorable case for that 17 work, involving "SELECT aid FROM pgbench_accounts WHERE aid IN (....)", with hundreds of contiguous array elements. More importantly, it avoids regressions with standard "pgbench -S" SELECT queries. It takes a long time to validate changes such as these -- the regressions are within the range of noise, so it is difficult to tell the difference between signal and noise. I probably went further with some of the new changes to _bt_check_compare than really made sense. For example, I probably added likely() or unlikely() annotations that don't really help. This is something that I will continue to refine. > One thing I am concerned about is that it reduces the cases where the > optimization of _bt_checkkeys_look_ahead() works effectively, as the > approach > skips the skip immediately on the first occurrence per page. I noticed that with the recent v17 revision of the patch, my original MDAM paper "sales_mdam_paper" test case (the complicated query in the introductory email of this thread) was about 2x slower. That's just not okay, obviously. But the issue was relatively easy to fix: it was fixed by making _bt_readpage not apply the "skipskip" optimization when on the first page for the current primitive index scan -- we already do this with the "precheck" optimization, so it's natural to do it with the "skipskip" optimization as well. The "sales_mdam_paper" test case involves thousands of primitive index scans that each access only one leaf page. But each leaf page returns 2 non-adjoining tuples, with quite a few non-matching tuples "in between" the matching tuples. There is one matching tuple for "store = 200", and another for "store = 250" -- and there's non-matching stores 201 - 249 between these two, which we want _bt_checkkeys_look_ahead to skip over. This is exactly the kind of case where the _bt_checkkeys_look_ahead() optimization is expected to help. > -- performance > SET skipscan_prefix_cols=32; > EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT, SETTINGS, WAL, VERBOSE) SELECT * > FROM t WHERE id2 = 360; -- cache > EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT, SETTINGS, WAL, VERBOSE) SELECT * > FROM t WHERE id2 = 360; > SET skipscan_prefix_cols=0; > EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT, SETTINGS, WAL, VERBOSE) SELECT * > FROM t WHERE id2 = 360; > > Again, the above results are provided for reference, as I believe that > many users prioritize stability and I'd like to take your new approach. Adversarial cases specifically designed to "make the patch look bad" are definitely useful review feedback. Ideally, the patch will be 100% free of regressions -- no matter how unlikely (or even silly) they may seem. I always prefer to not have to rely on anybody's opinion of what is likely or unlikely. :-) A quick test seems to show that this particular regression is more or less fixed by v18. As you said, the _bt_checkkeys_look_ahead stuff is the issue here (same with the MDAM sales query). You should confirm that the issue has actually been fixed, though. Thanks -- Peter Geoghegan