Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Matthias van de Meent <boekewurm+postgres@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
nbtree: Always set skipScan flag on rescan.
- 454c046094ab 19 (unreleased) landed
- bee763aea13f 18.0 landed
-
meson: Build numeric.c with -ftree-vectorize.
- 9016fa7e3bcd 19 (unreleased) cited
-
Fix "variable not found in subplan target lists" in semijoin de-duplication.
- b8a1bdc458e3 19 (unreleased) cited
-
Revert "nbtree: Remove useless row compare arg."
- dd2ce3792754 18.0 landed
-
nbtree: Remove useless row compare arg.
- 54c6ea8c81db 18.0 cited
-
Prevent premature nbtree array advancement.
- 5f4d98d4f371 18.0 landed
-
nbtree: tighten up array recheck rules.
- 7e25c9363a82 18.0 landed
-
Avoid treating nonrequired nbtree keys as required.
- 0f08df406822 18.0 landed
-
Adjust overstrong nbtree skip array assertion.
- 9d924dbb3710 18.0 landed
-
Make NULL tuple values always advance skip arrays.
- b75fedcab791 18.0 cited
-
Avoid extra index searches through preprocessing.
- b3f1a13f22f9 18.0 landed
-
Improve nbtree skip scan primitive scan scheduling.
- 21a152b37f36 18.0 landed
-
Further optimize nbtree search scan key comparisons.
- 8a510275dd6b 18.0 landed
-
Add nbtree skip scan optimization.
- 92fe23d93aa3 18.0 landed
-
Improve nbtree array primitive scan scheduling.
- 9a2e2a285a14 18.0 landed
-
nbtree: Make BTMaxItemSize into object-like macro.
- 426ea611171d 18.0 landed
-
Show index search count in EXPLAIN ANALYZE, take 2.
- 0fbceae841cb 18.0 landed
-
Make parallel nbtree index scans use an LWLock.
- 67fc4c9fd7fa 18.0 landed
-
Show index search count in EXPLAIN ANALYZE.
- 5ead85fbc811 18.0 landed
-
Avoid nbtree parallel scan currPos confusion.
- b5ee4e52026b 18.0 cited
-
nbtree: Remove useless 'strat' local variable.
- b6558e4f837e 18.0 landed
-
Normalize nbtree truncated high key array behavior.
- 79fa7b3b1a44 18.0 landed
-
Refactor handling of nbtree array redundancies.
- b524974106ac 18.0 landed
-
Fix nbtree pgstats accounting with parallel scans.
- c00c54a9ac1e 18.0 landed
- fb4f5e58af97 17.0 landed
-
Avoid parallel nbtree index scan hangs with SAOPs.
- d8adfc18bebf 18.0 landed
- a24bffc021d9 17.0 landed
-
Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE
- 5a1e6df3b84c 18.0 cited
-
Enhance nbtree ScalarArrayOp execution.
- 5bf748b86bc6 17.0 cited
-
Skip checking of scan keys required for directional scan in B-tree
- e0b1ee17dc3a 17.0 cited
-
Instead of using a numberOfRequiredKeys count to distinguish required
- 7ccaf13a06b8 8.2.0 cited
On Tue, 1 Apr 2025 at 21:02, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Tue, Apr 1, 2025 at 10:40 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > > When nbtree is passed input scan keys derived from a
> > > query predicate "WHERE b = 5", new nbtree preprocessing steps now output
> > > "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
> >
> > [...] new nbtree preprocessing steps now output +the equivalent of+ [...]
> >
> > > Preprocessing can freely add a skip array before or after any input
> > > ScalarArrayOp arrays.
> >
> > This implies skip arrays won't be generated for WHERE b = 5 (a
> > non-SAOP scankey) or WHERE b < 3 (not SAOP either), which is probably
> > not intentional, so a rewording would be appreciated.
>
> I don't agree. Yes, that sentence (taken in isolation) does not make
> it 100% unambiguous. But, would anybody ever actually be misled? Even
> once, ever? The misinterpretation of my words that you're concerned
> about is directly contradicted by the whole opening paragraph. Plus,
> it just doesn't make any sense. Obviously, you yourself never actually
> interpreted it that way. Right?
That's built on my knowledge about the internals of this patch ahead
of reading the message, not on a clean-sleet interpretation.
> The paragraph that this sentence appears in is all about the various
> ways in which SAOP arrays and skip arrays are the same thing, except
> at the very lowest level of the _bt_advance_array_keys code. I think
> that that context makes it particularly unlikely that anybody would
> ever think that I mean to imply something about the ways in which
> non-array keys can be composed alongside skip arrays.
>
> I'm pushing back here because I think that there's a real cost to
> using overly defensive language, aimed at some imaginary person.
While I agree that there is such a cost, I don't think that this is
too far fetched. They are not just added when we have SAOP scankeys,
and I think the non-SAOP case is one of the most important areas where
this patch improves performance. Yes, this patch improves performance
for queries with only SAOP on non-first keys, but I've seen more
non-SAOP queries where this patch would improve performance than
similar queries but with SAOP.
> > // nbtpreprocesskeys.c
> >
> > > +static bool _bt_s**parray_shrink
> >
> > I'd like to understand the "shrink" here, as it's not entirely clear to me.
> > The functions are exclusively called through dispatch in
> > _bt_compare_array_scankey_args, and I'd expected that naming to be
> > extended to these functions.
>
> I don't see the problem?
It's mostly as an observation (and not problem per se) that "compare"
(which sounds like a pure function that doens't modify anything, e.g.
_bt_compare) is used to dispatch to "shrink" (which does sound like
it'd modify something).
> > > + * This qual now becomes "WHERE x = ANY('{every possible x value}') and y = 4"
> >
> > I understand what you're going for, but a reference that indexed NULLs
> > are still handled correctly (rather than removed by the filter) would
> > be appreciated, as the indicated substitution would remove NULL
> > values. (This doesn't have to be much more than a footnote.)
> Again, it comes down to what the reader might actually be confused by,
> in the real world. Is it really plausible that I could ever commit a
> skip scan patch that wholly forgot to deal with NULLs sensibly? Would
> you personally ever think that I could make such an obvious blunder in
> a committed patch? And if you did, how long would it take you to
> figure out that there was no such oversight?
My comment is not about your potential future actions, but rather what
any future developer or committer working on this code might think and
worry about when reading this. = ANY {} constructs *always* have NOT
NULL behaviour, just like any other operator clause that isn't "IS
NULL", so clarifying that this is only similar -and does not behave
the same in important edge cases- is IMO important. Not specifically
for you, but for any other developer trying to get a correct
understanding of how this works and why it is correct.
> > > +#if 0
> > > + /* Could be a redundant input scan key, so can't do this: */
> > > + Assert(inkey->sk_strategy == BTEqualStrategyNumber ||
> > > + (inkey->sk_flags & SK_SEARCHNULL));
> > > +#endif
> >
> > I think this should be removed?
>
> Why? This is me expressing that I wish I could write this assertion,
> but it won't quite work. I cannot rule out rare corner-cases involving
> a contradictory pair of input keys, only one of which is a = key (the
> other might be some kind of inequality, which makes this would-be
> assertion not quite work). (You'll see similar "almost assertions"
> from time to time, in different parts of the codebase.)
It is my understanding that those are mostly historical artifacts of
the code base, and not systems in active development. Their rarety
makes it difficult to put numbers on, but IIRC at least one such case
was recently removed for bitrot and apparent lack of use in years.
> > > +#ifdef DEBUG_DISABLE_SKIP_SCAN
> >
> > I noticed this one and only reference to DEBUG_DISABLE_SKIP_SCAN. Are
> > you planning on keeping that around, or is this a leftover?
>
> I deliberately left this in place, just in case somebody wants to see
> what happens when preprocessing stops generating skip arrays entirely.
> Without this, it's not too obvious that it can just be turned off by
> forcing _bt_num_array_keys to return early.
Ah, I see.
> > > prev_numSkipArrayKeys, *numSkipArrayKeys
> >
> > I'm a bit confused why we primarily operate on *numSkipArrayKeys,
> > rather than a local variable that we store in *numSkipArrayKeys once
> > we know we can generate skip keys. I.e., I'd expected something more
> > in line with the following snippet (incomplete code blocks):
> > I think this is easier for the compiler to push the store operation
> > out of the loop (assuming it's optimizable at all; but even if it
> > isn't it removes the load of *numSkipArrayKeys from the hot path).
>
> What if I just had a local copy of numSkipArrayKeys, and copied back
> into caller's arg when the function returns? We'll still need a
> prev_numSkipArrayKeys under this scheme, but we won't have to set the
> caller's pointer until right at the end anymore (which, I agree, seems
> like it might be worth avoiding).
That's a nice alternative too, indeed.
> > I think the increment/decrement callbacks for skipsupport should
> > explicitly check (by e.g. Assert) for NULL (or alternatively: same
> > value) returns on overflow, and the API definition should make this
> > explicit.
>
> But the API definition *does* specifically address the opclass
> author's responsibilities around NULLs? It specifically says that it's
> not up to the opclass author to think about them at all.
Yes. What I'm suggesting is to make the contract enforceable to a
degree. If it was defined to "return (Datum) 0 on overflow", then that
could be Assert()ed; and code that does leak memory could get detected
by static analysis tools in the function scope because the allocation
didn't get returned, but with this definition returning an allocation
is never detected because that's not part of the contract.
> > The current system is quite easy to build a leaking
> > implementation for. Sure, there are other easy ways to break this, but
> > I think it's an easy API modification that makes things just that bit
> > safer.
>
> How can I do that? The callbacks themselves (the callbacks for
> functions that are called as the scan progresses) don't use the fmgr
> interface.
You could Assert(inc_sk_argument == (Datum) 0) in the oflow branch?
I'm certain that (Datum) 0 is an invalid representation of a pointer,
so we know that no allocated value is returned (be it new or
pre-existing).
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)