Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Peter Geoghegan <pg@bowt.ie>

From: Peter Geoghegan <pg@bowt.ie>

To: Alena Rybakina <a.rybakina@postgrespro.ru>

Cc: Matthias van de Meent <boekewurm+postgres@gmail.com>, Heikki Linnakangas <hlinnaka@iki.fi>, Masahiro Ikeda <ikedamsh@oss.nttdata.com>, Tomas Vondra <tomas@vondra.me>, Masahiro.Ikeda@nttdata.com, pgsql-hackers@lists.postgresql.org, Masao.Fujii@nttdata.com

Date: 2025-03-28T20:15:19Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

nbtree: Always set skipScan flag on rescan.
- 454c046094ab 19 (unreleased) landed
- bee763aea13f 18.0 landed
meson: Build numeric.c with -ftree-vectorize.
- 9016fa7e3bcd 19 (unreleased) cited
Fix "variable not found in subplan target lists" in semijoin de-duplication.
- b8a1bdc458e3 19 (unreleased) cited
Revert "nbtree: Remove useless row compare arg."
- dd2ce3792754 18.0 landed
nbtree: Remove useless row compare arg.
- 54c6ea8c81db 18.0 cited
Prevent premature nbtree array advancement.
- 5f4d98d4f371 18.0 landed
nbtree: tighten up array recheck rules.
- 7e25c9363a82 18.0 landed
Avoid treating nonrequired nbtree keys as required.
- 0f08df406822 18.0 landed
Adjust overstrong nbtree skip array assertion.
- 9d924dbb3710 18.0 landed
Make NULL tuple values always advance skip arrays.
- b75fedcab791 18.0 cited
Avoid extra index searches through preprocessing.
- b3f1a13f22f9 18.0 landed
Improve nbtree skip scan primitive scan scheduling.
- 21a152b37f36 18.0 landed
Further optimize nbtree search scan key comparisons.
- 8a510275dd6b 18.0 landed
Add nbtree skip scan optimization.
- 92fe23d93aa3 18.0 landed
Improve nbtree array primitive scan scheduling.
- 9a2e2a285a14 18.0 landed
nbtree: Make BTMaxItemSize into object-like macro.
- 426ea611171d 18.0 landed
Show index search count in EXPLAIN ANALYZE, take 2.
- 0fbceae841cb 18.0 landed
Make parallel nbtree index scans use an LWLock.
- 67fc4c9fd7fa 18.0 landed
Show index search count in EXPLAIN ANALYZE.
- 5ead85fbc811 18.0 landed
Avoid nbtree parallel scan currPos confusion.
- b5ee4e52026b 18.0 cited
nbtree: Remove useless 'strat' local variable.
- b6558e4f837e 18.0 landed
Normalize nbtree truncated high key array behavior.
- 79fa7b3b1a44 18.0 landed
Refactor handling of nbtree array redundancies.
- b524974106ac 18.0 landed
Fix nbtree pgstats accounting with parallel scans.
- c00c54a9ac1e 18.0 landed
- fb4f5e58af97 17.0 landed
Avoid parallel nbtree index scan hangs with SAOPs.
- d8adfc18bebf 18.0 landed
- a24bffc021d9 17.0 landed
Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE
- 5a1e6df3b84c 18.0 cited
Enhance nbtree ScalarArrayOp execution.
- 5bf748b86bc6 17.0 cited
Skip checking of scan keys required for directional scan in B-tree
- e0b1ee17dc3a 17.0 cited
Instead of using a numberOfRequiredKeys count to distinguish required
- 7ccaf13a06b8 8.2.0 cited

On Thu, Mar 27, 2025 at 6:03 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
> I replied an example like this:

This example shows costs that are dominated by heap access costs. Both
the sequential scan and the bitmap heap scan must access 637 heap
blocks. So I don't think that this is such a great example -- the heap
accesses are irrelevant from the point of view of assessing how well
we're modelling index scan related costs.

> I think it would be useful to show information that we used an index scan but at the same time we skipped the "region" column and I assume we should output how many distinct values the "region" column had.
>
> For example it will look like this "Skip Scan on region (4 distinct values)":

> What do you think?

As I said on our call today, I think that we should keep the output
for EXPLAIN ANALYZE simple. While I'm sympathetic to the idea that we
should show more information about how quals can be applied in index
scan node output, that seems like it should be largely independent
work to me.

Masahiro Ikeda wrote a patch that aimed to improve matters in this
area some months back. I'm supportive of that (there is definitely
value in signalling to users that the index might actually look quite
different to how they imagine it looks, say by having an
omitted-by-query prefix attribute). I don't exactly know what the most
useful kind of information to show is with skip scan in place, since
skip scan makes the general nature of quals (whether a given qual is
what oracle calls "access predicates", or what oracle calls "filter
predicates") is made squishy/dynamic by skip scan, in a way that is
new.

The relationship between the number of values that a skip array ever
uses, and the number of primitive index scans is quite complicated.
Sometimes it is actually as simple as your example query, but that's
often not true. "Index Searches: N" can be affected by:

* The use of SAOP arrays, which also influence primitive scan
scheduling, in the same way as they have since Postgres 17 -- and can
be mixed freely with skip arrays.

* The availability of opclass skipsupport, which makes skip arrays
generate their element values by addition/subtraction from the current
array element, rather than using NEXT/PRIOR sentinel keys.

The sentinel keys act as probes that get the next real (non-sentinel)
value that we need to look up next. Whereas skip support can often
successfully guess that (for example) the next value in the index
after 268 is 269, saving a primitive scan each time (this might not
happen at all, or it might work only some of the time, or it might
work all of the time).

* Various primitive index scan scheduling heuristics.

Another concern here is that I don't want to invent a special kind of
"index search" just for skip scan. We're going to show an "Index
Searches: N" that's greater than 1 with SAOP array keys, too -- which
don't use skip scan at all (nothing new about that, except for the
fact that we report the number of searches directly from EXPLAIN
ANALYZE in Postgres 18). There really is almost no difference between
a scan with a skip array and a scan of the same index with a similar
SAOP array (when each array "contains the same elements", and is used
to scan the same index, in the same way). That's why the cost model is
as similar as possible to the Postgres 17 costing of SAOP array scans
-- it's really the same access method. Reusing the cost model makes
sense because actual execution times are almost identical when we
compare a skip array to a SAOP array in the way that I described.

The only advantage that I see from putting something about "skip scan"
in EXPLAIN ANALYZE is that it is more googleable that way. But it
seems like "Index Searches: N" is almost as good, most of the time. In
any case, the fact that we don't need a separate optimizer index
path/executor node for this is something that I see as a key
advantage, and something that I'd like EXPLAIN ANALYZE to preserve.

The problem with advertising that an index scan node is a skip scan
is: what if it just never skips? Never skipping like this isn't
necessarily unexpected. And even if it is unexpected, it's not
necessarily a problem.

> I didn't see any regression tests. Maybe we should add some tests? To be honest I didn't see it mentioned in the commit message but I might have missed something.

There are definitely new regression tests -- I specifically tried to
keep the test coverage high, using gcov html reports (like the ones
from coverage.postgresql.org). The test updates appear towards the end
of the big patch file, though. Maybe you're just not used to seeing
tests appear last like this?

I use "git config diff.orderfile ... " to get this behavior. I find it
useful to put the important changes (particularly header file changes)
first, and less important changes (like tests) much later.

Thanks for taking a look at my patch!
--
Peter Geoghegan