Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Peter Geoghegan <pg@bowt.ie>

From: Peter Geoghegan <pg@bowt.ie>
To: Matthias van de Meent <boekewurm+postgres@gmail.com>
Cc: Heikki Linnakangas <hlinnaka@iki.fi>, Masahiro Ikeda <ikedamsh@oss.nttdata.com>, Tomas Vondra <tomas@vondra.me>, Masahiro.Ikeda@nttdata.com, pgsql-hackers@lists.postgresql.org, Masao.Fujii@nttdata.com
Date: 2025-03-22T17:47:41Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. nbtree: Always set skipScan flag on rescan.

  2. meson: Build numeric.c with -ftree-vectorize.

  3. Fix "variable not found in subplan target lists" in semijoin de-duplication.

  4. Revert "nbtree: Remove useless row compare arg."

  5. nbtree: Remove useless row compare arg.

  6. Prevent premature nbtree array advancement.

  7. nbtree: tighten up array recheck rules.

  8. Avoid treating nonrequired nbtree keys as required.

  9. Adjust overstrong nbtree skip array assertion.

  10. Make NULL tuple values always advance skip arrays.

  11. Avoid extra index searches through preprocessing.

  12. Improve nbtree skip scan primitive scan scheduling.

  13. Further optimize nbtree search scan key comparisons.

  14. Add nbtree skip scan optimization.

  15. Improve nbtree array primitive scan scheduling.

  16. nbtree: Make BTMaxItemSize into object-like macro.

  17. Show index search count in EXPLAIN ANALYZE, take 2.

  18. Make parallel nbtree index scans use an LWLock.

  19. Show index search count in EXPLAIN ANALYZE.

  20. Avoid nbtree parallel scan currPos confusion.

  21. nbtree: Remove useless 'strat' local variable.

  22. Normalize nbtree truncated high key array behavior.

  23. Refactor handling of nbtree array redundancies.

  24. Fix nbtree pgstats accounting with parallel scans.

  25. Avoid parallel nbtree index scan hangs with SAOPs.

  26. Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE

  27. Enhance nbtree ScalarArrayOp execution.

  28. Skip checking of scan keys required for directional scan in B-tree

  29. Instead of using a numberOfRequiredKeys count to distinguish required

Attachments

On Fri, Mar 21, 2025 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
> A big part of the concern here is with the existing pstate.prechecked
> optimization (the one added to Postgres 17 by Alexander Korotkov's
> commit e0b1ee17). It now seems quite redundant -- the new
> _bt_skip_ikeyprefix mechanism added by my 0003-* patch does the same
> thing, but does it better (especially since I taught
> _bt_skip_ikeyprefix to deal with simple inequalities in v29). I now
> think that it makes most sense to totally replace pstate.prechecked
> with _bt_skip_ikeyprefix -- we should use _bt_skip_ikeyprefix during
> every scan (not just during skip scans, not just during scans with
> SAOP array keys), and be done with it.

I just committed "Improve nbtree array primitive scan scheduling".

Attached is v30, which fully replaces the pstate.prechecked
optimization with the new _bt_skip_ikeyprefix optimization (which now
appears in v30-0002-Lower-nbtree-skip-array-maintenance-overhead.patch,
and not in 0003-*, due to my committing the primscan scheduling patch
just now).

I'm now absolutely convinced that fully generalizing
_bt_skip_ikeyprefix (as described in yesterday's email) is the right
direction to take things in. It seems to have no possible downside.

> Under this new scheme, so->scanBehind is strictly a flag that
> indicates that a recheck is scheduled, to be performed once the scan
> calls _bt_readpage for the next page. It no longer serves role #1,
> only role #2. That seems significantly simpler.

I especially like this about the new _bt_skip_ikeyprefix scheme.
Having so->scanBehind strictly be a flag (that tracks if we need a
recheck at the start of reading the next page) substantially lowers
the cognitive burden for somebody trying to understand how the
primitive scan scheduling stuff works.

The newly expanded _bt_skip_ikeyprefix needs quite a bit more testing
and polishing to be committable. I didn't even update the relevant
commit message for v30. Plus I'm not completely sure what to do about
RowCompare keys just yet, which have some funny rules when dealing
with NULLs.

-- 
Peter Geoghegan