Re: Pathify RHS unique-ification for semijoin planning
wenhui qiu <qiuwenhuifx@gmail.com>
From: wenhui qiu <qiuwenhuifx@gmail.com>
To: Andy Fan <zhihuifan1213@163.com>
Cc: Richard Guo <guofenglinux@gmail.com>, PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: 2025-05-22T13:51:32Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Simplify relation_has_unique_index_for()
- bf9ee294e567 19 (unreleased) landed
-
Pathify RHS unique-ification for semijoin planning
- 24225ad9aafc 19 (unreleased) landed
-
Convert varatt.h access macros to static inline functions.
- e035863c9a04 19 (unreleased) cited
-
Re-export a few of createplan.c's make_xxx() functions.
- 570be1f73f38 9.6.0 cited
On Thu, 22 May 2025 at 17:28, Andy Fan <zhihuifan1213@163.com> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > Hi, > > > However, in the case of sort-based implementation, > > this function pays no attention to the subpath's pathkeys or the > > pathkeys of the resulting output. > > Good finding! > > > > > In addition to this specific issue, it seems to me that there are > > other potential issues in create_unique_path(). > > > > * Currently, we only consider the cheapest_total_path of the RHS when > > unique-ifying it. > > I think it is better have a check the tuple_fraction for the startup_cost > factor, for some paths where the total cost is high but the required > fraction is lower. > > > I think this may cause us to miss some optimization > > opportunities. For example, there might be a path with a better sort > > order that isn't the cheapest-total one. Such a path could help avoid > > a sort at a higher level, potentially resulting in a cheaper overall > > plan. > > > * In create_unique_path(), we currently rely on heuristics to decide > > whether to use a hash-based or sort-based method. I think a better > > approach would be to create paths for both methods and let add_path() > > determine which one is better, similar to how we handle path selection > > in other parts of the planner. > > > > Therefore, I'm thinking that maybe we could create a new RelOptInfo > > for the RHS rel to represent its unique-ified version, and then > > generate all worthwhile paths for it, > > This sounds great for me. and I think we can keep the fraction > cheapest path on the new RelOptInfo as well, then all the things should > be on the way. > > > To be concrete, I'm envisioning something like the following: > > > > if (bms_equal(sjinfo->syn_righthand, rel2->relids) && > > - create_unique_path(root, rel2, rel2->cheapest_total_path, > > - sjinfo) != NULL) > > + (rel2_unique = create_unique_rel(root, rel2, sjinfo)) != > NULL) > > > > ... > > > > - add_paths_to_joinrel(root, joinrel, rel1, rel2, > > - JOIN_UNIQUE_INNER, sjinfo, > > + add_paths_to_joinrel(root, joinrel, rel1, rel2_unique, > > + JOIN_INNER, sjinfo, > > restrictlist); > > - add_paths_to_joinrel(root, joinrel, rel2, rel1, > > - JOIN_UNIQUE_OUTER, sjinfo, > > + add_paths_to_joinrel(root, joinrel, rel2_unique, rel1, > > + JOIN_INNER, sjinfo, > > restrictlist); > > > > In addition, by changing the join from "rel1" and "rel2" using > > JOIN_UNIQUE_OUTER or JOIN_UNIQUE_INNER to a join between "rel1" and > > "rel2_unique" using a standard JOIN_INNER, we might be able to get > > rid of the JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER jointypes. > > >>if we can, +10. Agree Pls kindly release a path for this? > > > > > >