Re: Trying out read streams in pgvector (an extension)

Melanie Plageman <melanieplageman@gmail.com>

From: Melanie Plageman <melanieplageman@gmail.com>

To: Thomas Munro <thomas.munro@gmail.com>

Cc: Peter Geoghegan <pg@bowt.ie>, Nazir Bilal Yavuz <byavuz81@gmail.com>, "Jonathan S. Katz" <jkatz@postgresql.org>, pgsql-hackers <pgsql-hackers@postgresql.org>

Date: 2025-12-09T21:42:21Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add read_stream_{pause,resume}()
- 38229cb90516 19 (unreleased) landed

On Mon, Dec 8, 2025 at 10:47 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> I think it'd be better if that were the consumer's choice.   I don't
> want the consumer to be required to drain the stream before resuming,
> as that'd be an unprincipled stall.  For example, if new WAL arrives
> over the network then I think it should be possible for recovery's
> WAL-powered stream of heap pages to resume looking ahead even if
> recovery hasn't drained the existing stream completely.
>
> 1.  read_stream_resume() as before, but with a new explicit
> read_stream_pause(): if a block number callback would like to report a
> temporary lack of information, it should return
> read_stream_pause(stream), not InvalidBlockNumber.  Then after
> read_stream_resume(stream) is called, the next
> read_stream_next_buffer() enters the lookahead loop again.  While
> paused, if the consumer drains all the existing buffers in the stream
> and then one more, it will receive InvalidBuffer, but if the _resume()
> call is made sooner, the consumer won't ever know about the temporary
> lack of buffers in the stream.

I like this new interface. If the user does want to exhaust the stream
(as was the case with earlier pgvector read stream user code), I
assume you would want to do:

read_stream_pause()
read_stream_reset()
read_stream_resume()

- Melanie