Re: Moving _bt_readpage and _bt_checkkeys into a new .c file
Peter Geoghegan <pg@bowt.ie>
From: Peter Geoghegan <pg@bowt.ie>
To: Victor Yegorov <vyegorov@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-12-07T02:44:46Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Avoid pointer chasing in _bt_readpage inner loop.
- 83a26ba59b18 19 (unreleased) landed
-
Relocate _bt_readpage and related functions.
- 65d6acbc5649 19 (unreleased) landed
-
Fix bug in nbtree array primitive scan scheduling.
- 763d65ae2545 18.0 cited
Attachments
- 0003-Use-ignore_killed_tuples-local-variable.txt (text/plain)
- v1-plus-ignore_killed_tuples-change.out (application/octet-stream)
- v1-only.out (application/octet-stream)
On Sat, Dec 6, 2025 at 3:04 PM Peter Geoghegan <pg@bowt.ie> wrote: > My best guess is that the benefits I see come from eliminating a > dependent load. Without the second patch applied, I see this > disassembly for _bt_checkkeys: > > mov rax,QWORD PTR [rdi+0x38] ; Load scan->opaque > mov r15d,DWORD PTR [rax+0x70] ; Load so->dir > > A version with the second patch applied still loads a pointer passed > by the _bt_checkkeys caller (_bt_readpage), but doesn't have to chase > another pointer to get to it. Maybe this significantly ameliorates > execution port pressure in the cases where I see a speedup? I found a way to further speed up the queries that the second patch already helped with, following profiling with perf: if _bt_readpage takes a local copy of scan->ignore_killed_tuples when first called, and then uses that local copy within its per-tuple loop (instead of using scan->ignore_killed_tuples directly), it gives me an additional 1% speedup over what I reported earlier today. In other words, the range/BETWEEN pgbench variant I summarized earlier today goes from being about 4.5% faster than master, to being about ~5.5% faster than master. Testing has also shown that the ignore_killed_tuples enhancement doesn't significantly change the picture with other types of queries (such as the default pgbench SELECT). In short, this ignore_killed_tuples change makes the second patch from v1 more effective, seemingly by further ameliorating the same bottleneck. Apparently accessing scan->ignore_killed_tuples created another load-use hazard in the same tight inner loop (the per-tuple _bt_readpage loop). Which matters with these queries, where we don't need to do very much work per-tuple (_bt_readpage's pstate.startikey optimization is as effective as possible here) and have quite a few tuples (2,000 tuples) that need to be returned by each test query run. Since this ignore_killed_tuples change is also very simple, and also seems like an easy win, I think that it can be committed as part of the second patch. Without it needing to wait for too much more performance validation. Attached are 2 text files showing pgbench output/summary info, generated by my test script (both are from runs that took place within the last 2 hours). One of these result sets just confirms what I reported earlier on, with an unmodified v1 patchset. The other set of results/file shows detailed results for the v1 patchset with the ignore_killed_tuples change also applied, for the same pgbench config/workload. This second file gives full details to back up my "~5.5% faster than master" claim. The pgbench script used for this is as follows: \set aid random_exponential(1, 100000 * :scale, 3.0) \set endrange :aid + 2000 SELECT abalance FROM pgbench_accounts WHERE aid between :aid AND :endrange; I'm deliberately not attaching a new v2 for this ignore_killed_tuples change right now. The first patch is a few hundred KBs, and I don't want this email to get held up in moderation. Though I will attach the ignore_killed_tuples change in its own patch, which I've also attached (with a .txt extension, just to avoid confusing CFTester). -- Peter Geoghegan