Re: Use merge-based matching for MCVs in eqjoinsel

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>

From: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: 2025-09-03T16:53:51Z
Lists: pgsql-hackers

Attachments

Following up on my previous messages about optimizing eqjoinsel() and 
eqjoinsel_semi() for Var1 = Var2 clauses, I’d like to share detailed 
profiling results showing the effect of the patch on JOB for different 
values of default_statistics_target.

The first table shows the total planner time (summed over all 113 
queries) before and after applying the patch, along with the speedup 
achieved:

default_statistics_target | Planner Speedup (×) | Planner Before (ms) | 
Planner After (ms)
--------------------------+---------------------+---------------------+--------------------
                      100  | *1.00x*       | 1828.433     |        1820.556
                     1000  | *1.12x*       | 2194.282     |        1963.110
                     2500  | *2.15x*       | 4606.705     |        2140.126
                     5000  | *6.37x*       | 16661.581     |        2616.109
                     7500  | *11.76x*       | 35988.569     |        
3061.161
                    10000  | *19.01x*       | 66616.620     |        
3504.144


The second table shows the profiling of eqjoinsel() using *perf*, 
demonstrating that the function, which dominates planning at high 
statistics targets, becomes essentially negligible after the patch:

default_statistics_target | eqjoinsel() Before (perf) | eqjoinsel() 
After (perf)
--------------------------+---------------------------+--------------------------
                      100  |                     0.01% 
|                     0.04%
                     1000  |                     6.23% 
|                     0.06%
                     2500  |                    35.45% 
|                     0.23%
                     5000  |                    66.14% 
|                     0.53%
                     7500  |                    72.70% 
|                     0.97%
                    10000  |                    75.42% 
|                     1.25%

I’ve attached v3 of the patch. This version adds a check for NULL values 
when comparing MCV entries, ensuring correctness in edge cases.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com