Re: index prefetching
Peter Geoghegan <pg@bowt.ie>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
aio: io_uring: Trigger async processing for large IOs
- a9ee66881744 19 (unreleased) landed
-
read stream: Split decision about look ahead for AIO and combining
- 8ca147d582a5 19 (unreleased) landed
-
read_stream: Only increase read-ahead distance when waiting for IO
- f63ca3379025 19 (unreleased) landed
-
read_stream: Prevent distance from decaying too quickly
- 6e36930f9aaf 19 (unreleased) landed
-
Reduce ExecSeqScan* code size using pg_assume()
- b227b0bb4e03 19 (unreleased) cited
-
Fix rare bug in read_stream.c's split IO handling.
- b421223172a2 19 (unreleased) cited
-
Fix multiranges to behave more like dependent types.
- 3e8235ba4f9c 17.0 cited
-
Add EXPLAIN (MEMORY) to report planner memory consumption
- 5de890e3610d 17.0 cited
-
Optimize nbtree backward scan boundary cases.
- c9c0589fda0e 17.0 cited
-
Increment xactCompletionCount during subtransaction abort.
- 90c885cdab8b 14.0 cited
-
Add nbtree Valgrind buffer lock checks.
- 4a70f829d86c 14.0 cited
-
Add nbtree high key "continuescan" optimization.
- 29b64d1de7c7 12.0 cited
-
Reduce pinning and buffer content locking for btree scans.
- 2ed5b87f96d4 9.5.0 cited
-
Teach btree to handle ScalarArrayOpExpr quals natively.
- 9e8da0f75731 9.2.0 cited
On Wed, Dec 17, 2025 at 12:19 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> create table t (pk integer primary key, payload text default repeat('x',
> 1000)) with (fillfactor=10);
> insert into t values (generate_series(1,10000000))
>
> So it creates table with size 80Gb (160 after vacuum) which doesn't fit
> in RAM.
160 after VACUUM? What do you mean?
> but what confuses me is that they do not depend on
> `effective_io_concurrency`.
You did change other settings, right? You didn't just use the default
shared_buffers, for example? (Sorry, I have to ask.)
> Moreover with `enable_indexscan_prefetch=off` results are the same.
It's quite unlikely that the current heuristics that trigger
prefetching would have ever allowed any prefetching, for queries such
as these.
The exact rule right now is that we don't even begin prefetching until
we've already read at least one index leaf page, and have to read
another one. So it's impossible to use prefetching with a LIMIT of 1,
with queries such as these. It's highly unlikely that you'd see any
benefits from prefetching even with LIMIT 100 (usually we wouldn't
even begin prefetching).
> Also I expected that the best effect of index prefetching should be for
> larger limit (accessing more heap pages). But as you see - it is not true.
>
> May we there is something wrong with my test scenario.
I could definitely believe that the new amgetbatch interface is
noticeably faster with range queries. Maybe 5% - 10% faster (even
without using the heap-buffer-locking optimization we've talked about
on this thread, which you can't have used here because I haven't
posted it to the list just yet). But a near 2x improvement wildly
exceeds my expectations. Honestly, I have no idea why the patch is so
much faster, and suspect an invalid result.
It might make sense for you to try it again with just the first patch
applied (the patch that adds the basic table AM and index AM interface
revisions, and makes nbtree supply its own amgetbatch/replaces
btgetbatch with btgettuple). I suppose it's possible that Andres'
patch 0004 somehow played some role here, since that is independently
useful work (I don't quite recall the details of where else that might
be useful right now). But that's just a wild guess.
> It will be nice to get some information about efficiency of prefetch,
> for example add `pefetch` option to explain: `explain
> (analyze,buffers,prefetch) ...`
> I think that in `pgaio_io_wait` we can distinguish IO operations which
> are completed without waiting and can be considered as prefetch hit.
> Right now it is hard to understand without debugger whether prefetch is
> perfromed at all.
Tomas did write a patch for that, but it isn't particularly well
optimized. I have mostly avoided using it for that reason. Basic
performance validation of the patch set is really hard in general, and
I've found it easier to just be extremely paranoid.
--
Peter Geoghegan