Re: Should we optimize the `ORDER BY random() LIMIT x` case?

Aleksander Alekseev <aleksander@timescale.com>

From: Aleksander Alekseev <aleksander@timescale.com>

To: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>

Cc: Andrei Lepikhov <lepihov@gmail.com>, Nico Williams <nico@cryptonector.com>, Tom Lane <tgl@sss.pgh.pa.us>, wenhui qiu <qiuwenhuifx@gmail.com>, Vik Fearing <vik@postgresfriends.org>

Date: 2025-05-20T10:46:38Z

Lists: pgsql-hackers

Andrei,

> > ```
> > -- imagine replacing inefficient array_sample(array_agg(t), 10)
> > -- with more efficient array_sample_reservoir(t, 10)
> > SELECT (unnest(agg)).* AS k FROM
> > (  SELECT array_sample(array_agg(t), 10) AS agg FROM (
> >     ... here goes the subquery ...
> >     ) AS t
> > );
> > ```
> >
> > ... if only we supported such a column expansion for not registered
> > records. Currently such a query fails with:
> >
> > ```
> > ERROR:  record type has not been registered
> > ```
> I know about this issue. Having resolved it in a limited number of local
> cases (like FDW push-down of row types), I still do not have a universal
> solution worth proposing upstream. Do you have any public implementation
> of the array_sample_reservoir to play with?

array_sample_reservoir() is purely a figment of my imagination at the
moment. Semantically it does the same as array_sample(array_agg(t), N)
except the fact that array_sample(..., N) requires the array to have
at least N items. You can experiment with array_sample(array_agg(...),
N) as long as the subquery returns much more than N rows.

-- 
Best regards,
Aleksander Alekseev