Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Natalya Aksman <natalya@tigerdata.com>

From: Natalya Aksman <natalya@tigerdata.com>
To: Peter Geoghegan <pg@bowt.ie>
Cc: Masahiro Ikeda <ikedamsh@oss.nttdata.com>, Tomas Vondra <tomas@vondra.me>, Masahiro.Ikeda@nttdata.com, pgsql-hackers@lists.postgresql.org, Masao.Fujii@nttdata.com
Date: 2025-09-10T18:59:02Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. nbtree: Always set skipScan flag on rescan.

  2. meson: Build numeric.c with -ftree-vectorize.

  3. Fix "variable not found in subplan target lists" in semijoin de-duplication.

  4. Revert "nbtree: Remove useless row compare arg."

  5. nbtree: Remove useless row compare arg.

  6. Prevent premature nbtree array advancement.

  7. nbtree: tighten up array recheck rules.

  8. Avoid treating nonrequired nbtree keys as required.

  9. Adjust overstrong nbtree skip array assertion.

  10. Make NULL tuple values always advance skip arrays.

  11. Avoid extra index searches through preprocessing.

  12. Improve nbtree skip scan primitive scan scheduling.

  13. Further optimize nbtree search scan key comparisons.

  14. Add nbtree skip scan optimization.

  15. Improve nbtree array primitive scan scheduling.

  16. nbtree: Make BTMaxItemSize into object-like macro.

  17. Show index search count in EXPLAIN ANALYZE, take 2.

  18. Make parallel nbtree index scans use an LWLock.

  19. Show index search count in EXPLAIN ANALYZE.

  20. Avoid nbtree parallel scan currPos confusion.

  21. nbtree: Remove useless 'strat' local variable.

  22. Normalize nbtree truncated high key array behavior.

  23. Refactor handling of nbtree array redundancies.

  24. Fix nbtree pgstats accounting with parallel scans.

  25. Avoid parallel nbtree index scan hangs with SAOPs.

  26. Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE

  27. Enhance nbtree ScalarArrayOp execution.

  28. Skip checking of scan keys required for directional scan in B-tree

  29. Instead of using a numberOfRequiredKeys count to distinguish required

Timescaledb implemented multikey skipscan feature for queries like "select
distinct key1, key2 ... from t_indexed_on_key1_key2". It pins key1 to a
found key value (i.e key1=val1)  to skip over distinct values of key2. Then
after values for (key1=va1) are exhausted the next distinct tuple is
searched with (key1>val1).

In short, this implementation can change the scan key structure from
"key1=val1" to "key1>val1" and back, and not just the key comparison value
(i.e. val1).
It means that so->skipScan can get reset from true to false after the next
call to _bt_preprocess_keys.

But after btrescan resets "so->numberOfKeys = 0", so->skipScan is not reset
to "false" in  _bt_preprocess_keys because of this code:
https://github.com/postgres/postgres/blob/9016fa7e3bcde8ae4c3d63c707143af147486a10/src/backend/access/nbtree/nbtpreprocesskeys.c#L1847
After we set "so->numberOfKeys = 0" we quit on line 1847 before we get to
the line 1874 where we do "so->skipScan = (numSkipArrayKeys > 0);"
https://github.com/postgres/postgres/blob/9016fa7e3bcde8ae4c3d63c707143af147486a10/src/backend/access/nbtree/nbtpreprocesskeys.c#L1874

I.e. if btrescan resets  "so->numberOfKeys = 0",  _bt_preprocess_keys quits
before resetting  so->skipScan to false.
It is not an issue when the scan key structure is not changed in amrescan,
and I see that this is an intended usage.
But in case the intended amrescan usage changes in the future, the issue
may come up.

It's not a priority at the moment as we can reset so->skipScan in our
extension.

Thank you,
Natalya Aksman.



On Wed, Sep 10, 2025 at 12:46 PM Peter Geoghegan <pg@bowt.ie> wrote:

> On Wed, Sep 10, 2025 at 9:53 AM Natalya Aksman <natalya@tigerdata.com>
> wrote:
> > Our Timescaledb extension has scenarios changing ">" quals to "=" and
> back on rescan and it breaks when so->Skipscan needs to be reset from true
> to false.
>
> But the amrescan docs say:
>
> "In practice the restart feature is used when a new outer tuple is
> selected by a nested-loop join and so a new key comparison value is
> needed, but the scan key structure remains the same" [1].
>
> I don't understand why it is that our not resetting the so->Skipscan
> flag within btrescan has any particular significance to Timescaledb,
> relative to all of the other fields that are supposed to be set by
> _bt_preprocess_keys. What is the actual failure you see? Is it an
> assertion failure within _bt_readpage/_bt_checkkeys?
>
> Note that btrescan *does* set "so->numberOfKeys = 0", which will make
> the next call to _bt_preprocess_keys (from _bt_first) perform
> preprocessing from scratch. This should set so->Skipscan from scratch
> on each rescan (along with every other field set by preprocessing). It
> seems like that should work for you (in spite of the fact that you're
> doing something that seems at odds with the index AM API).
>
> [1] https://www.postgresql.org/docs/current/index-functions.html
> --
> Peter Geoghegan
>