Re: Trying out read streams in pgvector (an extension)
Peter Geoghegan <pg@bowt.ie>
From: Peter Geoghegan <pg@bowt.ie>
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Melanie Plageman <melanieplageman@gmail.com>,
Nazir Bilal Yavuz <byavuz81@gmail.com>, "Jonathan S. Katz" <jkatz@postgresql.org>,
pgsql-hackers <pgsql-hackers@postgresql.org>
Date: 2025-12-09T22:38:10Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add read_stream_{pause,resume}()
- 38229cb90516 19 (unreleased) landed
On Mon, Dec 8, 2025 at 10:47 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Yielding just because you've scanned N index pages/tuples/whatever is > harder to think about. The stream shouldn't get far ahead unless it's > recently been useful for I/O concurrency (though optimal distance > heuristics are an open problem), but in this case a single invocation > of the block number callback can call ReadBuffer() an arbitrary number > of times, filtering out all the index tuples as it rampages through > the whole index IIUC. I see why you might want to yield periodically > if you can, but I also wonder how much that can really help if you > still have to pick up where you left off next time. I think of it as a necessary precaution against pathological behavior where the amount of memory used to cache matching tuples/TIDs gets out of hand. There's no specific reason to expect that to happen (or no good reason). But I'm pretty sure that it'll prove necessary to pay non-zero attention to how much work has been done since the last time we returned a tuple (when there's a tuple available to return). > I guess it > depends on the distribution of matches. To be clear, I haven't done any kind of modelling of the problems in this area. Once I do that (in 2026), I'll be able to say more about the requirements. Maybe Tomas could take a look sooner? Right now my focus is on getting the basic interfaces/API revisions in better shape. And avoiding regressions while doing so. -- Peter Geoghegan