Thread
-
Re: index prefetching
Tomas Vondra <tomas@vondra.me> — 2025-12-18T14:40:59Z
On 12/18/25 14:57, Konstantin Knizhnik wrote: > > On 17/12/2025 9:54 PM, Tomas Vondra wrote: >> On 12/17/25 20:30, Andres Freund wrote: >>> Hi, >>> >>> On 2025-12-17 13:49:43 -0500, Peter Geoghegan wrote: >>>> On Wed, Dec 17, 2025 at 12:19 PM Konstantin Knizhnik >>>> <knizhnik@garret.ru> wrote: >>>>> Moreover with `enable_indexscan_prefetch=off` results are the same. >>>> It's quite unlikely that the current heuristics that trigger >>>> prefetching would have ever allowed any prefetching, for queries such >>>> as these. >>>> >>>> The exact rule right now is that we don't even begin prefetching until >>>> we've already read at least one index leaf page, and have to read >>>> another one. So it's impossible to use prefetching with a LIMIT of 1, >>>> with queries such as these. It's highly unlikely that you'd see any >>>> benefits from prefetching even with LIMIT 100 (usually we wouldn't >>>> even begin prefetching). >>> Note that due to the tuple size and fillfactor in Konstantin's >>> workload, there >>> will be one tuple per page... That should allow for some prefetching. >>> >> Yes, but that's in the heap. The mechanism Peter described is about leaf >> pages in the index, and the index has the usual fillfactor. So there'll >> be many index entries per leaf. >> > I slightly change my benchmark setup: > > create table t (pk integer, sk integer, payload text default repeat('x', > 1000)) with (fillfactor=10); > insert into t values (generate_series(1,10000000),random()*10000000); > create index on t(sk); > > select.sql: > > \set sk random(1, 10000000) > select * from t where sk >= :sk order by sk limit N; > > You are right. There is almost no effect of prefetch for limit=100, but > ~2x times improvement for limit=1000: > > eio\limit 1 100 1000 > 10 11102 142 28 > 0 11419 137 14 > > master: > limit 1 100 1000 > 11480 130 13 > > One of the motivation of my experiments was to check that there is no > degrade of performance because of batching. > And it is nice that there is no performance penalty here. > Still it is not quite clear to me why there is no any positive effect > for LIMIT 100. The technical reason is that batch_getnext() does this: /* Delay initializing stream until reading from scan's second batch */ if (priorbatch && !scan->xs_heapfetch->rs && !batchqueue->disabled && enable_indexscan_prefetch) scan->xs_heapfetch->rs = read_stream_begin_relation(READ_STREAM_DEFAULT, NULL, ....); which means we only create the read_stream (which is what enables the prefetching) only when creating the second batch. And with LIMIT 100 we likely read just a single leaf page (=batch) most of the time, which means no read_stream and thus no prefetching. You can try disabling this "priorbatch" condition, so that the read_stream gets created right away. > Reading 100 random heap pages definitely should take advantages of AIO. > We have also implemented prefetching for index only scan in Neon and > here effect for similar query is quite noticeable (~3x times). > But in Neon architecture prices of IO is much higher because requires > network communication with page server. > True, but only if the data is not already in memory / shared buffers. IIRC this "priorbatch" logic mitigates regressions for cached workloads, because the read_stream initialization is expensive enough to hurt small queries when no I/O is needed. Maybe the tradeoff is different for Neon, which probably can't rely on cache that much? It's also true tying this to the number of batches is a bit coarse, because the batch size can vary a lot. It can be a couple items or hundreds of items, easily. I believe we're open to alternative ideas. regards -- Tomas Vondra