Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Dmitry Dolgov <9erthalion6@gmail.com>

From: Dmitry Dolgov <9erthalion6@gmail.com>
To: Peter Geoghegan <pg@bowt.ie>
Cc: Masahiro.Ikeda@nttdata.com, pgsql-hackers@lists.postgresql.org, Masao.Fujii@nttdata.com
Date: 2024-08-03T19:34:54Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. nbtree: Always set skipScan flag on rescan.

  2. meson: Build numeric.c with -ftree-vectorize.

  3. Fix "variable not found in subplan target lists" in semijoin de-duplication.

  4. Revert "nbtree: Remove useless row compare arg."

  5. nbtree: Remove useless row compare arg.

  6. Prevent premature nbtree array advancement.

  7. nbtree: tighten up array recheck rules.

  8. Avoid treating nonrequired nbtree keys as required.

  9. Adjust overstrong nbtree skip array assertion.

  10. Make NULL tuple values always advance skip arrays.

  11. Avoid extra index searches through preprocessing.

  12. Improve nbtree skip scan primitive scan scheduling.

  13. Further optimize nbtree search scan key comparisons.

  14. Add nbtree skip scan optimization.

  15. Improve nbtree array primitive scan scheduling.

  16. nbtree: Make BTMaxItemSize into object-like macro.

  17. Show index search count in EXPLAIN ANALYZE, take 2.

  18. Make parallel nbtree index scans use an LWLock.

  19. Show index search count in EXPLAIN ANALYZE.

  20. Avoid nbtree parallel scan currPos confusion.

  21. nbtree: Remove useless 'strat' local variable.

  22. Normalize nbtree truncated high key array behavior.

  23. Refactor handling of nbtree array redundancies.

  24. Fix nbtree pgstats accounting with parallel scans.

  25. Avoid parallel nbtree index scan hangs with SAOPs.

  26. Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE

  27. Enhance nbtree ScalarArrayOp execution.

  28. Skip checking of scan keys required for directional scan in B-tree

  29. Instead of using a numberOfRequiredKeys count to distinguish required

> On Wed, Jun 26, 2024 at 03:16:07PM GMT, Peter Geoghegan wrote:
>
> Loose index scan is a far more specialized technique than skip scan.
> It only applies within special scans that feed into a DISTINCT group
> aggregate. Whereas my skip scan patch isn't like that at all -- it's
> much more general. With my patch, nbtree has exactly the same contract
> with the executor/core code as before. There are no new index paths
> generated by the optimizer to make skip scan work, even. Skip scan
> isn't particularly aimed at improving group aggregates (though the
> benchmark I'll show happens to involve a group aggregate, simply
> because the technique works best with large and expensive index
> scans).

I see that the patch is not supposed to deal with aggregates in any special
way. But from what I understand after a quick review, skip scan is not getting
applied to them if there are no quals in the query (in that case
_bt_preprocess_keys returns before calling _bt_preprocess_array_keys). Yet such
queries could benefit from skipping, I assume they still could be handled by
the machinery introduced in this patch?

> > Currently, there is an assumption that "there will be 10 primitive index scans
> > per skipped attribute". Is any chance to use pg_stats.n_distinct?
>
> It probably makes sense to use pg_stats.n_distinct here. But how?
>
> If the problem is that we're too pessimistic, then I think that this
> will usually (though not always) make us more pessimistic. Isn't that
> the wrong direction to go in? (We're probably also too optimistic in
> some cases, but being too pessimistic is a bigger problem in
> practice.)
>
> For example, your test case involved 11 distinct values in each
> column. The current approach of hard-coding 10 (which is just a
> temporary hack) should actually make the scan look a bit cheaper than
> it would if we used the true ndistinct.
>
> Another underlying problem is that the existing SAOP costing really
> isn't very accurate, without skip scan -- that's a big source of the
> pessimism with arrays/skipping. Why should we be able to get the true
> number of primitive index scans just by multiplying together each
> omitted prefix column's ndistinct? That approach is good for getting
> the worst case, which is probably relevant -- but it's probably not a
> very good assumption for the average case. (Though at least we can cap
> the total number of primitive index scans to 1/3 of the total number
> of pages in the index in btcostestimate, since we have guarantees
> about the worst case as of Postgres 17.)

Do I understand correctly, that the only way how multiplying ndistincts could
produce too pessimistic results is when there is a correlation between distinct
values? Can one benefit from the extended statistics here?

And while we're at it, I think it would be great if the implementation will
allow some level of visibility about the skip scan. From what I see, currently
it's by design impossible for users to tell whether something was skipped or
not. But when it comes to planning and estimates, maybe it's not a bad idea to
let explain analyze show something like "expected number of primitive scans /
actual number of primitive scans".