Thread

Re: BUG #19106: Potential regression with CTE materialization planning in Postgres 18

Richard Guo <guofenglinux@gmail.com> — 2025-11-12T07:04:05Z
On Wed, Nov 12, 2025 at 5:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thanks for doing that research.  It's not at all surprising if back
> in the early 90's nobody had tried to make it work for sub-selects
> (or at least had not succeeded), so they just wrote the spec to not
> require it.

I played with this patch, but I couldn't quite wrap my head around the
expected behavior of using subqueries as arguments to aggregate
functions.  The outputs of the following queries are confusing to me.

create table t (a int);
insert into t values (1), (2);

Query 1:
select (select sum((select a from t t1 limit 1))) from t t2;
 sum
-----
   1
   1
(2 rows)

As I understand it, a query of the form:

    SELECT <scalar_expression> FROM table;

... produces one output row for each row in the table, with the value
of <scalar_expression> evaluated for that row.  Thus, the output of
Query 1 makes sense to me.

Query 2:
select (select sum((select a from t t1 where a = t2.a or true limit
1))) from t t2;
 sum
-----
   2
(1 row)

I don't quite understand the output of Query 2.  The subquery is now
correlated with the outer table t2, but I believe it's still in the
same form as Query 1, so I would expect it to also produce one output
row per table row.  Moreover, IIUC, the "or true" clause should make
the two queries semantically equivalent.

Query 3:
with t as (select a from (values (1), (2)) as v(a))
select (select sum((select a from t t1 where a = t2.a or true limit
1))) from t t2;
 sum
-----
   1
   1
(2 rows)

Query 3 replaces the physical table with a CTE that produces the same
logical table content, so I would expect the query's output to remain
unchanged.  So the differing outputs of Query 2 and Query 3 are also
confusing to me.

- Richard