Re: Pathify RHS unique-ification for semijoin planning
Richard Guo <guofenglinux@gmail.com>
From: Richard Guo <guofenglinux@gmail.com>
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Cc: Andy Fan <zhihuifan1213@163.com>, wenhui qiu <qiuwenhuifx@gmail.com>
Date: 2025-07-04T01:41:35Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Simplify relation_has_unique_index_for()
- bf9ee294e567 19 (unreleased) landed
-
Pathify RHS unique-ification for semijoin planning
- 24225ad9aafc 19 (unreleased) landed
-
Convert varatt.h access macros to static inline functions.
- e035863c9a04 19 (unreleased) cited
-
Re-export a few of createplan.c's make_xxx() functions.
- 570be1f73f38 9.6.0 cited
On Thu, Jul 3, 2025 at 7:06 PM Richard Guo <guofenglinux@gmail.com> wrote:
> This patch does not apply again, so here is a new rebase.
>
> This version also fixes an issue related to parameterized paths: if
> the RHS has LATERAL references to the LHS, unique-ification becomes
> meaningless because the RHS depends on the LHS, and such paths should
> not be generated.
(The cc list is somehow lost; re-ccing.)
FWIW, I noticed that the row/cost estimates for the unique-ification
node on master can be very wrong. For example:
create table t(a int, b int);
insert into t select i%100, i from generate_series(1,10000)i;
vacuum analyze t;
set enable_hashagg to off;
explain (costs on)
select * from t t1, t t2 where (t1.a, t2.b) in
(select a, b from t t3 where t1.b is not null offset 0);
And look at the snippet from the plan:
(on master)
-> Unique (cost=934.39..1009.39 rows=10000 width=8)
-> Sort (cost=271.41..271.54 rows=50 width=8)
Sort Key: "ANY_subquery".a, "ANY_subquery".b
-> Subquery Scan on "ANY_subquery" (cost=0.00..270.00
rows=50 width=8)
The row estimate for the subpath is 50, but it increases to 10000
after unique-ification. How does that make sense?
This issue does not occur with this patch:
(on patched)
-> Unique (cost=271.41..271.79 rows=50 width=8)
-> Sort (cost=271.41..271.54 rows=50 width=8)
Sort Key: "ANY_subquery".a, "ANY_subquery".b
-> Subquery Scan on "ANY_subquery" (cost=0.00..270.00
rows=50 width=8)
Thanks
Richard