Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Aleksander Alekseev <aleksander@timescale.com>
From: Aleksander Alekseev <aleksander@timescale.com>
To: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Cc: Andrei Lepikhov <lepihov@gmail.com>,
Nico Williams <nico@cryptonector.com>, Tom Lane <tgl@sss.pgh.pa.us>, wenhui qiu <qiuwenhuifx@gmail.com>, Vik Fearing <vik@postgresfriends.org>
Date: 2025-05-20T10:46:38Z
Lists: pgsql-hackers
Andrei, > > ``` > > -- imagine replacing inefficient array_sample(array_agg(t), 10) > > -- with more efficient array_sample_reservoir(t, 10) > > SELECT (unnest(agg)).* AS k FROM > > ( SELECT array_sample(array_agg(t), 10) AS agg FROM ( > > ... here goes the subquery ... > > ) AS t > > ); > > ``` > > > > ... if only we supported such a column expansion for not registered > > records. Currently such a query fails with: > > > > ``` > > ERROR: record type has not been registered > > ``` > I know about this issue. Having resolved it in a limited number of local > cases (like FDW push-down of row types), I still do not have a universal > solution worth proposing upstream. Do you have any public implementation > of the array_sample_reservoir to play with? array_sample_reservoir() is purely a figment of my imagination at the moment. Semantically it does the same as array_sample(array_agg(t), N) except the fact that array_sample(..., N) requires the array to have at least N items. You can experiment with array_sample(array_agg(...), N) as long as the subquery returns much more than N rows. -- Best regards, Aleksander Alekseev