Re: Should we optimize the `ORDER BY random() LIMIT x` case?

Vik Fearing <vik@postgresfriends.org>

From: Vik Fearing <vik@postgresfriends.org>
To: Tom Lane <tgl@sss.pgh.pa.us>, Aleksander Alekseev <aleksander@timescale.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Andrei Lepikhov <lepihov@gmail.com>, wenhui qiu <qiuwenhuifx@gmail.com>
Date: 2025-05-16T21:10:49Z
Lists: pgsql-hackers
On 16/05/2025 15:01, Tom Lane wrote:
> Aleksander Alekseev <aleksander@timescale.com> writes:
>> If I'm right about the limitations of aggregate functions and SRFs
>> this leaves us the following options:
>> 1. Changing the constraints of aggregate functions or SRFs. However I
>> don't think we want to do it for such a single niche scenario.
>> 2. Custom syntax and a custom node.
>> 3. To give up
> Seems to me the obvious answer is to extend TABLESAMPLE (or at least, some
> of the tablesample methods) to allow it to work on a subquery.


Isn't this a job for <fetch first clause>?


Example:

SELECT ...
FROM ... JOIN ...
FETCH SAMPLE FIRST 10 ROWS ONLY


Then the nodeLimit could do some sort of reservoir sampling.


There are several enhancements to <fetch first clause> coming down the 
pipe, this could be one of them.

-- 

Vik Fearing