Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add nbtree skip scan optimization.

  1. Re: Adding skip scan (including MDAM style range skip scan) to nbtree

    BharatDB <bharatdbpg@gmail.com> — 2025-09-08T03:54:46Z

    Dear Team,
    
    With reference to the conversation ongoing in message ID :
    c562dc2a-6e36-46f3-a5ea-cd42eebd7118, I am writing to express my interest
    in contributing to the ongoing work on fixing the bug related to Adding
    skip scan (including MDAM style range skip scan) to nbtree.
    
    I have been following this discussion on the regression related to commit
    92fe23d93aa (skip scan in nbtree), and I ran some tests on my side to
    understand it better.
    Observations :
    
       -
    
       I reproduced Tomas’s pgbench test with a simple workload on a
       single-column index,
    
       SELECT count(*) FROM pgbench_accounts WHERE bid = 0;
    
       -
    
       Throughput with the skip-scan build was consistently ~40–50% lower
       compared to pre-patch builds.
       -
    
       After setting MALLOC_TOP_PAD_= 64MB, the performance gap disappeared
       almost entirely, confirming that the issue is allocator overhead from
       frequent malloc/free calls rather than the skip-scan logic itself.
    
    Reproduction steps :
    
    Here is the exact setup I used (very close to Tomas’s):
    
    # init database
    pg_ctl -D data init
    pg_ctl -D data -l pg.log start
    createdb test
    # create table and index
    psql test -c 'CREATE TABLE pgbench_accounts (aid int, bid int,
    abalance int, filler text);'
    psql test -c 'CREATE INDEX ON pgbench_accounts(bid);'
    # load pgbench data (scale 1)
    pgbench -i -s 1 test
    # custom query file (select.sql)echo "SELECT count(*) FROM
    pgbench_accounts WHERE bid = 0;" > select.sql
    # run benchmarksfor m in simple prepared; do
      for c in 1 4 32; do
        pgbench -n -f select.sql -M $m -T 10 -c $c -j $c test | grep tps;
      done;done
    
    When running the above, the skip-scan build consistently showed ~50% lower
    tps compared to pre-patch, unless MALLOC_TOP_PAD_ was increased.
    Thoughts on causes :
    
       -
    
       The increase in IndexAmRoutine size seems to push the cache structures
       past glibc’s small-heap thresholds, forcing more system allocations.
       -
    
       As Tomas noted, this is fragile: even if we drop the unused options
       support proc, future extensions to the struct could trigger the same issue
       again.
    
    Suggestions / possible directions :
    
       1.
    
       *Short term (PG18) *:
       -
    
          If we want a low-risk change, removing the unused options support
          function may be acceptable, but I agree it feels like a
    temporary band-aid.
          -
    
          Alternatively, shipping PG18 as-is with a release note warning about
          allocator sensitivity might be the safest option.
          2.
    
       *Longer term (PG19) *:
       -
    
          Explore *static allocation of IndexAmRoutine* instead of per-AM
          dynamic allocation. This should eliminate repeated malloc churn.
          -
    
          Add a micro-benchmark or regression test that stresses catalog cache
          growth and malloc behavior (similar to pgbench with many partitions), so
          allocator-driven regressions are detected earlier.
          -
    
          Consider documenting allocator tuning (MALLOC_TOP_PAD_) as a
          workaround until the structural fix lands.
    
    Closing :
    
    I don’t have a final patch proposal at this stage, but I would like to help
    test any candidate fixes or prototypes. If there’s interest, I can also
    contribute a self-contained benchmark script for regression testing.
    
    Regards,
    Athiyaman