Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Alena Rybakina <a.rybakina@postgrespro.ru>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
nbtree: Always set skipScan flag on rescan.
- 454c046094ab 19 (unreleased) landed
- bee763aea13f 18.0 landed
-
meson: Build numeric.c with -ftree-vectorize.
- 9016fa7e3bcd 19 (unreleased) cited
-
Fix "variable not found in subplan target lists" in semijoin de-duplication.
- b8a1bdc458e3 19 (unreleased) cited
-
Revert "nbtree: Remove useless row compare arg."
- dd2ce3792754 18.0 landed
-
nbtree: Remove useless row compare arg.
- 54c6ea8c81db 18.0 cited
-
Prevent premature nbtree array advancement.
- 5f4d98d4f371 18.0 landed
-
nbtree: tighten up array recheck rules.
- 7e25c9363a82 18.0 landed
-
Avoid treating nonrequired nbtree keys as required.
- 0f08df406822 18.0 landed
-
Adjust overstrong nbtree skip array assertion.
- 9d924dbb3710 18.0 landed
-
Make NULL tuple values always advance skip arrays.
- b75fedcab791 18.0 cited
-
Avoid extra index searches through preprocessing.
- b3f1a13f22f9 18.0 landed
-
Improve nbtree skip scan primitive scan scheduling.
- 21a152b37f36 18.0 landed
-
Further optimize nbtree search scan key comparisons.
- 8a510275dd6b 18.0 landed
-
Add nbtree skip scan optimization.
- 92fe23d93aa3 18.0 landed
-
Improve nbtree array primitive scan scheduling.
- 9a2e2a285a14 18.0 landed
-
nbtree: Make BTMaxItemSize into object-like macro.
- 426ea611171d 18.0 landed
-
Show index search count in EXPLAIN ANALYZE, take 2.
- 0fbceae841cb 18.0 landed
-
Make parallel nbtree index scans use an LWLock.
- 67fc4c9fd7fa 18.0 landed
-
Show index search count in EXPLAIN ANALYZE.
- 5ead85fbc811 18.0 landed
-
Avoid nbtree parallel scan currPos confusion.
- b5ee4e52026b 18.0 cited
-
nbtree: Remove useless 'strat' local variable.
- b6558e4f837e 18.0 landed
-
Normalize nbtree truncated high key array behavior.
- 79fa7b3b1a44 18.0 landed
-
Refactor handling of nbtree array redundancies.
- b524974106ac 18.0 landed
-
Fix nbtree pgstats accounting with parallel scans.
- c00c54a9ac1e 18.0 landed
- fb4f5e58af97 17.0 landed
-
Avoid parallel nbtree index scan hangs with SAOPs.
- d8adfc18bebf 18.0 landed
- a24bffc021d9 17.0 landed
-
Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE
- 5a1e6df3b84c 18.0 cited
-
Enhance nbtree ScalarArrayOp execution.
- 5bf748b86bc6 17.0 cited
-
Skip checking of scan keys required for directional scan in B-tree
- e0b1ee17dc3a 17.0 cited
-
Instead of using a numberOfRequiredKeys count to distinguish required
- 7ccaf13a06b8 8.2.0 cited
Hi! Sorry for my later feedback, I didn't have enough time because of my work and the conference that was held during these two days. On 28.03.2025 23:15, Peter Geoghegan wrote: > On Thu, Mar 27, 2025 at 6:03 PM Alena Rybakina > <a.rybakina@postgrespro.ru> wrote: >> I replied an example like this: > This example shows costs that are dominated by heap access costs. Both > the sequential scan and the bitmap heap scan must access 637 heap > blocks. So I don't think that this is such a great example -- the heap > accesses are irrelevant from the point of view of assessing how well > we're modelling index scan related costs. You are right, the example was not very successful, to be honest I couldn't find better example. >> I think it would be useful to show information that we used an index scan but at the same time we skipped the "region" column and I assume we should output how many distinct values the "region" column had. >> >> For example it will look like this "Skip Scan on region (4 distinct values)": >> What do you think? > As I said on our call today, I think that we should keep the output > for EXPLAIN ANALYZE simple. While I'm sympathetic to the idea that we > should show more information about how quals can be applied in index > scan node output, that seems like it should be largely independent > work to me. > > Masahiro Ikeda wrote a patch that aimed to improve matters in this > area some months back. I'm supportive of that (there is definitely > value in signalling to users that the index might actually look quite > different to how they imagine it looks, say by having an > omitted-by-query prefix attribute). Thank you for your insights and explanation. I agree that this is a separate piece of work, and I’ll definitely take a closer look at Masahiro Ikeda’s contributions. > I don't exactly know what the most > useful kind of information to show is with skip scan in place, since > skip scan makes the general nature of quals (whether a given qual is > what oracle calls "access predicates", or what oracle calls "filter > predicates") is made squishy/dynamic by skip scan, in a way that is > new. > > The relationship between the number of values that a skip array ever > uses, and the number of primitive index scans is quite complicated. > Sometimes it is actually as simple as your example query, but that's > often not true. "Index Searches: N" can be affected by: > > * The use of SAOP arrays, which also influence primitive scan > scheduling, in the same way as they have since Postgres 17 -- and can > be mixed freely with skip arrays. I see that this work is really voluminous and yes, I agree with you that optimization for skipping index scanning in terms of displaying information about changing quals, if any, can even be done using Oracle as an example. For example, if you introduce a new range ANY(<every possible 'a' value>) due to skipping the first column in the index key, it will be useful for users to know. Or if you define a new range that the skip array will consider during the index scan. Well, and other things related to this, but not specifically with your patch, for example in the case of conflicting conditions or defining boundaries in the case of index scanning, like, when a>1 and a>10 we need to scan only a>10. This optimization was also introduced by you earlier even before the patch on skip optimization, but I think it also lacks some output in the explain. > * The availability of opclass skipsupport, which makes skip arrays > generate their element values by addition/subtraction from the current > array element, rather than using NEXT/PRIOR sentinel keys. > > The sentinel keys act as probes that get the next real (non-sentinel) > value that we need to look up next. Whereas skip support can often > successfully guess that (for example) the next value in the index > after 268 is 269, saving a primitive scan each time (this might not > happen at all, or it might work only some of the time, or it might > work all of the time). To be honest, I missed your point here. If possible, could you explain it in more detail? > * Various primitive index scan scheduling heuristics. > > Another concern here is that I don't want to invent a special kind of > "index search" just for skip scan. We're going to show an "Index > Searches: N" that's greater than 1 with SAOP array keys, too -- which > don't use skip scan at all (nothing new about that, except for the > fact that we report the number of searches directly from EXPLAIN > ANALYZE in Postgres 18). I agree with you, this is an improvement. "Index Searches: N" shows the number of individual index searches, but it is still not clear enough. Here you can additionally determine what happened based on the information about the number of scanned pages, but with large amounts of data this is difficult. By the way, are you planning to commit additional output to the explain about skipped pages? I think, together with the previous ones, the information about index scanning would be sufficient for analysis. Although I am still learning to understand correctly this information in the explain. By the way, I have long wanted to ask, maybe you can advise something else to read on this topic? Maybe not in this thread, so as not to overload this. > There really is almost no difference between > a scan with a skip array and a scan of the same index with a similar > SAOP array (when each array "contains the same elements", and is used > to scan the same index, in the same way). That's why the cost model is > as similar as possible to the Postgres 17 costing of SAOP array scans > -- it's really the same access method. Reusing the cost model makes > sense because actual execution times are almost identical when we > compare a skip array to a SAOP array in the way that I described. Yes, I agree with that. > The only advantage that I see from putting something about "skip scan" > in EXPLAIN ANALYZE is that it is more googleable that way. But it > seems like "Index Searches: N" is almost as good, most of the time. In > any case, the fact that we don't need a separate optimizer index > path/executor node for this is something that I see as a key > advantage, and something that I'd like EXPLAIN ANALYZE to preserve. I think it is worth adding "skip scan" information, without it it is difficult in my opinion to evaluate whether this index is effective in comparison with another, looking only at the information on scanned blocks or Index search or I missed something? I'm not sure that I correctly understood about "separation optimizer index path/executor stage". Do you mean that it's better to do all the optimizations during index execution rather than during index execution? I just remember you mentioned the Goetz Graefe interview on the call and and this was precisely his point of view, with which you agree. Is that what you mean? > The problem with advertising that an index scan node is a skip scan > is: what if it just never skips? Never skipping like this isn't > necessarily unexpected. And even if it is unexpected, it's not > necessarily a problem. I agree that there may not be a place for this to be used, but it is worth showing information about it if it does happen. On the other hand, here we need to be able to determine when it was necessary to perform skip scan optimization, but it was not there. But I'm not sure that it is possible to display this in the explain - only when analyzing the received query plan, namely the buffer statistics. >> I didn't see any regression tests. Maybe we should add some tests? To be honest I didn't see it mentioned in the commit message but I might have missed something. > There are definitely new regression tests -- I specifically tried to > keep the test coverage high, using gcov html reports (like the ones > from coverage.postgresql.org). The test updates appear towards the end > of the big patch file, though. Maybe you're just not used to seeing > tests appear last like this? > > I use "git config diff.orderfile ... " to get this behavior. I find it > useful to put the important changes (particularly header file changes) > first, and less important changes (like tests) much later. > > Thanks for taking a look at my patch! Yes, indeed, everything is in place, sorry, I didn't notice it right away, I'll be more attentive and I'll take note of your advice about the git - I haven't used such methods before) -- Regards, Alena Rybakina Postgres Professional