Use merge-based matching for MCVs in eqjoinsel

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>

From: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: 2025-07-21T13:55:56Z
Lists: pgsql-hackers

Attachments

Hi hackers,

While analyzing planner performance on JOB with 
default_statistics_target = 1000, I noticed that a significant portion 
of planning time is spent inside the eqjoinsel() function. According to 
perf, in most JOB queries at default_statistics_target = 1000, 
eqjoinsel() is the most expensive function during planning, accounting 
for approximately 8% of total CPU time. At default_statistics_target = 
10000, the planner spend up to 75% of its time inside eqjoinsel(), 
making it one of the primary bottlenecks.

Еhis overhead is caused by the O(N^2) nested-loop comparison of MCVs in 
var1 = var2 clauses.

I propose an optimization: when the column datatype supports 
ordering(i.e., has < and >), we can sort both MCV lists and apply 
mege-style algorithm to detect matches. This reduces runtime from O(N^2) 
to O(NlogN), where N is the number of MCV entries. The patch also 
applies the same optimization to semi-join clauses, which show similar 
performance behavior.

On JOB, this changes reduce planner time in most queries with complex 
joins and large MCVs with no observable effect on plan quality. I’ve 
also attached bar charts showing per-query planner time before and after 
the patch for default_statistics_target = 100, 1000, 10000 along with 
query numbers for reference.

Any feedback or suggestions are welcome!

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.