Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Peter Geoghegan <pg@bowt.ie>

From: Peter Geoghegan <pg@bowt.ie>
To: Alena Rybakina <a.rybakina@postgrespro.ru>
Cc: Matthias van de Meent <boekewurm+postgres@gmail.com>, Heikki Linnakangas <hlinnaka@iki.fi>, Masahiro Ikeda <ikedamsh@oss.nttdata.com>, Tomas Vondra <tomas@vondra.me>, Masahiro.Ikeda@nttdata.com, pgsql-hackers@lists.postgresql.org, Masao.Fujii@nttdata.com
Date: 2025-03-28T20:15:19Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. nbtree: Always set skipScan flag on rescan.

  2. meson: Build numeric.c with -ftree-vectorize.

  3. Fix "variable not found in subplan target lists" in semijoin de-duplication.

  4. Revert "nbtree: Remove useless row compare arg."

  5. nbtree: Remove useless row compare arg.

  6. Prevent premature nbtree array advancement.

  7. nbtree: tighten up array recheck rules.

  8. Avoid treating nonrequired nbtree keys as required.

  9. Adjust overstrong nbtree skip array assertion.

  10. Make NULL tuple values always advance skip arrays.

  11. Avoid extra index searches through preprocessing.

  12. Improve nbtree skip scan primitive scan scheduling.

  13. Further optimize nbtree search scan key comparisons.

  14. Add nbtree skip scan optimization.

  15. Improve nbtree array primitive scan scheduling.

  16. nbtree: Make BTMaxItemSize into object-like macro.

  17. Show index search count in EXPLAIN ANALYZE, take 2.

  18. Make parallel nbtree index scans use an LWLock.

  19. Show index search count in EXPLAIN ANALYZE.

  20. Avoid nbtree parallel scan currPos confusion.

  21. nbtree: Remove useless 'strat' local variable.

  22. Normalize nbtree truncated high key array behavior.

  23. Refactor handling of nbtree array redundancies.

  24. Fix nbtree pgstats accounting with parallel scans.

  25. Avoid parallel nbtree index scan hangs with SAOPs.

  26. Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE

  27. Enhance nbtree ScalarArrayOp execution.

  28. Skip checking of scan keys required for directional scan in B-tree

  29. Instead of using a numberOfRequiredKeys count to distinguish required

On Thu, Mar 27, 2025 at 6:03 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
> I replied an example like this:

This example shows costs that are dominated by heap access costs. Both
the sequential scan and the bitmap heap scan must access 637 heap
blocks. So I don't think that this is such a great example -- the heap
accesses are irrelevant from the point of view of assessing how well
we're modelling index scan related costs.

> I think it would be useful to show information that we used an index scan but at the same time we skipped the "region" column and I assume we should output how many distinct values the "region" column had.
>
> For example it will look like this "Skip Scan on region (4 distinct values)":

> What do you think?

As I said on our call today, I think that we should keep the output
for EXPLAIN ANALYZE simple. While I'm sympathetic to the idea that we
should show more information about how quals can be applied in index
scan node output, that seems like it should be largely independent
work to me.

Masahiro Ikeda wrote a patch that aimed to improve matters in this
area some months back. I'm supportive of that (there is definitely
value in signalling to users that the index might actually look quite
different to how they imagine it looks, say by having an
omitted-by-query prefix attribute). I don't exactly know what the most
useful kind of information to show is with skip scan in place, since
skip scan makes the general nature of quals (whether a given qual is
what oracle calls "access predicates", or what oracle calls "filter
predicates") is made squishy/dynamic by skip scan, in a way that is
new.

The relationship between the number of values that a skip array ever
uses, and the number of primitive index scans is quite complicated.
Sometimes it is actually as simple as your example query, but that's
often not true. "Index Searches: N" can be affected by:

* The use of SAOP arrays, which also influence primitive scan
scheduling, in the same way as they have since Postgres 17 -- and can
be mixed freely with skip arrays.

* The availability of opclass skipsupport, which makes skip arrays
generate their element values by addition/subtraction from the current
array element, rather than using NEXT/PRIOR sentinel keys.

The sentinel keys act as probes that get the next real (non-sentinel)
value that we need to look up next. Whereas skip support can often
successfully guess that (for example) the next value in the index
after 268 is 269, saving a primitive scan each time (this might not
happen at all, or it might work only some of the time, or it might
work all of the time).

* Various primitive index scan scheduling heuristics.

Another concern here is that I don't want to invent a special kind of
"index search" just for skip scan. We're going to show an "Index
Searches: N" that's greater than 1 with SAOP array keys, too -- which
don't use skip scan at all (nothing new about that, except for the
fact that we report the number of searches directly from EXPLAIN
ANALYZE in Postgres 18). There really is almost no difference between
a scan with a skip array and a scan of the same index with a similar
SAOP array (when each array "contains the same elements", and is used
to scan the same index, in the same way). That's why the cost model is
as similar as possible to the Postgres 17 costing of SAOP array scans
-- it's really the same access method. Reusing the cost model makes
sense because actual execution times are almost identical when we
compare a skip array to a SAOP array in the way that I described.

The only advantage that I see from putting something about "skip scan"
in EXPLAIN ANALYZE is that it is more googleable that way. But it
seems like "Index Searches: N" is almost as good, most of the time. In
any case, the fact that we don't need a separate optimizer index
path/executor node for this is something that I see as a key
advantage, and something that I'd like EXPLAIN ANALYZE to preserve.

The problem with advertising that an index scan node is a skip scan
is: what if it just never skips? Never skipping like this isn't
necessarily unexpected. And even if it is unexpected, it's not
necessarily a problem.

> I didn't see any regression tests. Maybe we should add some tests? To be honest I didn't see it mentioned in the commit message but I might have missed something.

There are definitely new regression tests -- I specifically tried to
keep the test coverage high, using gcov html reports (like the ones
from coverage.postgresql.org). The test updates appear towards the end
of the big patch file, though. Maybe you're just not used to seeing
tests appear last like this?

I use "git config diff.orderfile ... " to get this behavior. I find it
useful to put the important changes (particularly header file changes)
first, and less important changes (like tests) much later.

Thanks for taking a look at my patch!
-- 
Peter Geoghegan