Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Nico Williams <nico@cryptonector.com>
From: Nico Williams <nico@cryptonector.com>
To: Vik Fearing <vik@postgresfriends.org>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, Aleksander Alekseev <aleksander@timescale.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Andrei Lepikhov <lepihov@gmail.com>, wenhui qiu <qiuwenhuifx@gmail.com>
Date: 2025-05-16T21:53:29Z
Lists: pgsql-hackers
On Fri, May 16, 2025 at 11:10:49PM +0200, Vik Fearing wrote: > Isn't this a job for <fetch first clause>? > > Example: > > SELECT ... > FROM ... JOIN ... > FETCH SAMPLE FIRST 10 ROWS ONLY > > Then the nodeLimit could do some sort of reservoir sampling. The query might return fewer than N rows. What reservoir sampling requires is this bit of state: the count of input rows so far. The only way I know of to keep such state in a SQL query is with a RECURSIVE CTE, but unfortunately that would require unbounded CTE size, and it would require a way to query next rows one per-iteration. Nico --