Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Peter Geoghegan <pg@bowt.ie>

From: Peter Geoghegan <pg@bowt.ie>
To: BharatDB <bharatdbpg@gmail.com>
Cc: Tomas Vondra <tomas@vondra.me>, pgsql-hackers@lists.postgresql.org, pgsql-hackers@postgresql.org, Matthias van de Meent <boekewurm+postgres@gmail.com>, rmt@lists.postgresql.org, Mark Dilger <mark.dilger@enterprisedb.com>, Heikki Linnakangas <hlinnaka@iki.fi>, Alexander Korotkov <aekorotkov@gmail.com>
Date: 2025-09-10T19:27:36Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. nbtree: Always set skipScan flag on rescan.

  2. meson: Build numeric.c with -ftree-vectorize.

  3. Fix "variable not found in subplan target lists" in semijoin de-duplication.

  4. Revert "nbtree: Remove useless row compare arg."

  5. nbtree: Remove useless row compare arg.

  6. Prevent premature nbtree array advancement.

  7. nbtree: tighten up array recheck rules.

  8. Avoid treating nonrequired nbtree keys as required.

  9. Adjust overstrong nbtree skip array assertion.

  10. Make NULL tuple values always advance skip arrays.

  11. Avoid extra index searches through preprocessing.

  12. Improve nbtree skip scan primitive scan scheduling.

  13. Further optimize nbtree search scan key comparisons.

  14. Add nbtree skip scan optimization.

  15. Improve nbtree array primitive scan scheduling.

  16. nbtree: Make BTMaxItemSize into object-like macro.

  17. Show index search count in EXPLAIN ANALYZE, take 2.

  18. Make parallel nbtree index scans use an LWLock.

  19. Show index search count in EXPLAIN ANALYZE.

  20. Avoid nbtree parallel scan currPos confusion.

  21. nbtree: Remove useless 'strat' local variable.

  22. Normalize nbtree truncated high key array behavior.

  23. Refactor handling of nbtree array redundancies.

  24. Fix nbtree pgstats accounting with parallel scans.

  25. Avoid parallel nbtree index scan hangs with SAOPs.

  26. Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE

  27. Enhance nbtree ScalarArrayOp execution.

  28. Skip checking of scan keys required for directional scan in B-tree

  29. Instead of using a numberOfRequiredKeys count to distinguish required

On Wed, Sep 10, 2025 at 2:49 AM BharatDB <bharatdbpg@gmail.com> wrote:
> As a follow-up to the skip scan regression discussion, I tested a small patch that introduces static allocation/caching of `IndexAmRoutine` objects in `amapi.c`, removing the malloc/free overhead.

I think that it's too late to be considering anything this invasive for 18.

> Test setup :
> - Baseline: PG17 (commit before skip scan)
> - After: PG18 build with skip scan (patched)
> - pgbench scale=1, 100 partitions
> - Query: `select count(*) from pgbench_accounts where bid = 0`
> - Clients: 1, 4, 32
> - Protocols: simple, prepared
>
> Results (tps, 10s runs) :
>
> Mode Clients Before (PG17) After (PG18 w/ static fix)
>
> simple 1 23856 20332 (~15% lower)
> simple 4 55299 53184 (~4% lower)
> simple 32 79779 78347 (~2% lower)
>
> prepared 1 26364 26615 (no regression)
> prepared 4 55784 54437 (~2% lower)
> prepared 32 84687 80374 (~5% lower)
>
> This shows the static fix eliminates the severe ~50% regression previously observed by Tomas, leaving only a small residual slowdown (~2-15%).

The regression that Tomas reported is extreme and artificial. IIRC it
only affects partition queries with a hundred or so partitions, each
with an index-only scan that always scans exactly 0 index tuples, from
a pgbench_accounts that has the smallest possible amount of rows that
pgbench will allow (these are the cheapest possible index-only scans).
Plain index scans are not affected at all, presumably because it just
so happens that we don't allocate a BLCKSZ*2 workspace for plain index
scans, which is enough to put us well under the critical glibc
allocation size threshold (the threshold that the introduction of a
new nbtree support function put us over).

I also couldn't see anything like the 50% regression that Tomas
reported. And I couldn't recreate any problem unless partitioning was
used.

-- 
Peter Geoghegan