Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Alena Rybakina <a.rybakina@postgrespro.ru>

From: Alena Rybakina <a.rybakina@postgrespro.ru>

To: Peter Geoghegan <pg@bowt.ie>

Cc: Matthias van de Meent <boekewurm+postgres@gmail.com>, Heikki Linnakangas <hlinnaka@iki.fi>, Masahiro Ikeda <ikedamsh@oss.nttdata.com>, Tomas Vondra <tomas@vondra.me>, Masahiro.Ikeda@nttdata.com, pgsql-hackers@lists.postgresql.org, Masao.Fujii@nttdata.com

Date: 2025-04-01T19:08:26Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

nbtree: Always set skipScan flag on rescan.
- 454c046094ab 19 (unreleased) landed
- bee763aea13f 18.0 landed
meson: Build numeric.c with -ftree-vectorize.
- 9016fa7e3bcd 19 (unreleased) cited
Fix "variable not found in subplan target lists" in semijoin de-duplication.
- b8a1bdc458e3 19 (unreleased) cited
Revert "nbtree: Remove useless row compare arg."
- dd2ce3792754 18.0 landed
nbtree: Remove useless row compare arg.
- 54c6ea8c81db 18.0 cited
Prevent premature nbtree array advancement.
- 5f4d98d4f371 18.0 landed
nbtree: tighten up array recheck rules.
- 7e25c9363a82 18.0 landed
Avoid treating nonrequired nbtree keys as required.
- 0f08df406822 18.0 landed
Adjust overstrong nbtree skip array assertion.
- 9d924dbb3710 18.0 landed
Make NULL tuple values always advance skip arrays.
- b75fedcab791 18.0 cited
Avoid extra index searches through preprocessing.
- b3f1a13f22f9 18.0 landed
Improve nbtree skip scan primitive scan scheduling.
- 21a152b37f36 18.0 landed
Further optimize nbtree search scan key comparisons.
- 8a510275dd6b 18.0 landed
Add nbtree skip scan optimization.
- 92fe23d93aa3 18.0 landed
Improve nbtree array primitive scan scheduling.
- 9a2e2a285a14 18.0 landed
nbtree: Make BTMaxItemSize into object-like macro.
- 426ea611171d 18.0 landed
Show index search count in EXPLAIN ANALYZE, take 2.
- 0fbceae841cb 18.0 landed
Make parallel nbtree index scans use an LWLock.
- 67fc4c9fd7fa 18.0 landed
Show index search count in EXPLAIN ANALYZE.
- 5ead85fbc811 18.0 landed
Avoid nbtree parallel scan currPos confusion.
- b5ee4e52026b 18.0 cited
nbtree: Remove useless 'strat' local variable.
- b6558e4f837e 18.0 landed
Normalize nbtree truncated high key array behavior.
- 79fa7b3b1a44 18.0 landed
Refactor handling of nbtree array redundancies.
- b524974106ac 18.0 landed
Fix nbtree pgstats accounting with parallel scans.
- c00c54a9ac1e 18.0 landed
- fb4f5e58af97 17.0 landed
Avoid parallel nbtree index scan hangs with SAOPs.
- d8adfc18bebf 18.0 landed
- a24bffc021d9 17.0 landed
Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE
- 5a1e6df3b84c 18.0 cited
Enhance nbtree ScalarArrayOp execution.
- 5bf748b86bc6 17.0 cited
Skip checking of scan keys required for directional scan in B-tree
- e0b1ee17dc3a 17.0 cited
Instead of using a numberOfRequiredKeys count to distinguish required
- 7ccaf13a06b8 8.2.0 cited

Hi! Sorry for my later feedback, I didn't have enough time because of my 
work and the conference that was held during these two days.

On 28.03.2025 23:15, Peter Geoghegan wrote:
> On Thu, Mar 27, 2025 at 6:03 PM Alena Rybakina
> <a.rybakina@postgrespro.ru>  wrote:
>> I replied an example like this:
> This example shows costs that are dominated by heap access costs. Both
> the sequential scan and the bitmap heap scan must access 637 heap
> blocks. So I don't think that this is such a great example -- the heap
> accesses are irrelevant from the point of view of assessing how well
> we're modelling index scan related costs.
You are right, the example was not very successful, to be honest I 
couldn't find better example.
>> I think it would be useful to show information that we used an index scan but at the same time we skipped the "region" column and I assume we should output how many distinct values the "region" column had.
>>
>> For example it will look like this "Skip Scan on region (4 distinct values)":
>> What do you think?
> As I said on our call today, I think that we should keep the output
> for EXPLAIN ANALYZE simple. While I'm sympathetic to the idea that we
> should show more information about how quals can be applied in index
> scan node output, that seems like it should be largely independent
> work to me.
>
> Masahiro Ikeda wrote a patch that aimed to improve matters in this
> area some months back. I'm supportive of that (there is definitely
> value in signalling to users that the index might actually look quite
> different to how they imagine it looks, say by having an
> omitted-by-query prefix attribute).
Thank you for your insights and explanation. I agree that this is a 
separate piece of work, and I’ll definitely take a closer
look at Masahiro Ikeda’s contributions.
> I don't exactly know what the most
> useful kind of information to show is with skip scan in place, since
> skip scan makes the general nature of quals (whether a given qual is
> what oracle calls "access predicates", or what oracle calls "filter
> predicates") is made squishy/dynamic by skip scan, in a way that is
> new.
>
> The relationship between the number of values that a skip array ever
> uses, and the number of primitive index scans is quite complicated.
> Sometimes it is actually as simple as your example query, but that's
> often not true. "Index Searches: N" can be affected by:
>
> * The use of SAOP arrays, which also influence primitive scan
> scheduling, in the same way as they have since Postgres 17 -- and can
> be mixed freely with skip arrays.
I see that this work is really voluminous and yes, I agree with you that 
optimization for skipping index scanning
in terms of displaying information about changing quals, if any, can 
even be done using Oracle as an example.
For example, if you introduce a new range ANY(<every possible 'a' 
value>) due to skipping the first column
in the index key, it will be useful for users to know. Or if you define 
a new range that the skip array will consider
during the index scan. Well, and other things related to this, but not 
specifically with your patch, for example
in the case of conflicting conditions or defining boundaries in the case 
of index scanning, like, when a>1 and a>10
we need to scan only a>10.

This optimization was also introduced by you earlier even before the 
patch on skip optimization, but I think it also lacks
some output in the explain.

> * The availability of opclass skipsupport, which makes skip arrays
> generate their element values by addition/subtraction from the current
> array element, rather than using NEXT/PRIOR sentinel keys.
>
> The sentinel keys act as probes that get the next real (non-sentinel)
> value that we need to look up next. Whereas skip support can often
> successfully guess that (for example) the next value in the index
> after 268 is 269, saving a primitive scan each time (this might not
> happen at all, or it might work only some of the time, or it might
> work all of the time).

To be honest, I missed your point here. If possible, could you explain 
it in more detail?

> * Various primitive index scan scheduling heuristics.
>
> Another concern here is that I don't want to invent a special kind of
> "index search" just for skip scan. We're going to show an "Index
> Searches: N" that's greater than 1 with SAOP array keys, too -- which
> don't use skip scan at all (nothing new about that, except for the
> fact that we report the number of searches directly from EXPLAIN
> ANALYZE in Postgres 18).

I agree with you, this is an improvement. "Index Searches: N" shows the 
number of individual index searches, but
it is still not clear enough. Here you can additionally determine what 
happened based on the information about
the number of scanned pages, but with large amounts of data this is 
difficult.

By the way, are you planning to commit additional output to the explain 
about skipped pages? I think, together with
the previous ones, the information about index scanning would be 
sufficient for analysis.

Although I am still learning to understand correctly this information in 
the explain.
By the way, I have long wanted to ask, maybe you can advise something 
else to read on this topic?
Maybe not in this thread, so as not to overload this.
> There really is almost no difference between
> a scan with a skip array and a scan of the same index with a similar
> SAOP array (when each array "contains the same elements", and is used
> to scan the same index, in the same way). That's why the cost model is
> as similar as possible to the Postgres 17 costing of SAOP array scans
> -- it's really the same access method. Reusing the cost model makes
> sense because actual execution times are almost identical when we
> compare a skip array to a SAOP array in the way that I described.

Yes, I agree with that.

> The only advantage that I see from putting something about "skip scan"
> in EXPLAIN ANALYZE is that it is more googleable that way. But it
> seems like "Index Searches: N" is almost as good, most of the time. In
> any case, the fact that we don't need a separate optimizer index
> path/executor node for this is something that I see as a key
> advantage, and something that I'd like EXPLAIN ANALYZE to preserve.
I think it is worth adding "skip scan" information, without it it is 
difficult in my opinion to evaluate
whether this index is effective in comparison with another, looking only 
at the information on
scanned blocks or Index search or I missed something?

I'm not sure that I correctly understood about "separation optimizer 
index path/executor stage". Do you mean that it's better to do all the 
optimizations during index execution rather than during index execution? 
I just remember you mentioned the Goetz Graefe interview on the call and 
and this was precisely his point of view, with which you agree. Is that 
what you mean?

> The problem with advertising that an index scan node is a skip scan
> is: what if it just never skips? Never skipping like this isn't
> necessarily unexpected. And even if it is unexpected, it's not
> necessarily a problem.

I agree that there may not be a place for this to be used, but it is 
worth showing information about it if it does happen.

On the other hand, here we need to be able to determine when it was 
necessary to perform skip scan optimization, but
it was not there. But I'm not sure that it is possible to display this 
in the explain - only when analyzing the received query plan,
namely the buffer statistics.

>> I didn't see any regression tests. Maybe we should add some tests? To be honest I didn't see it mentioned in the commit message but I might have missed something.
> There are definitely new regression tests -- I specifically tried to
> keep the test coverage high, using gcov html reports (like the ones
> from coverage.postgresql.org). The test updates appear towards the end
> of the big patch file, though. Maybe you're just not used to seeing
> tests appear last like this?
>
> I use "git config diff.orderfile ... " to get this behavior. I find it
> useful to put the important changes (particularly header file changes)
> first, and less important changes (like tests) much later.
>
> Thanks for taking a look at my patch!

Yes, indeed, everything is in place, sorry, I didn't notice it right 
away, I'll be more attentive and I'll take note of your advice about the 
git - I haven't used such methods before)

-- 
Regards,
Alena Rybakina
Postgres Professional