Re: Indexes on expressions with multiple columns and operators

Andrei Lepikhov <lepihov@gmail.com>

From: Andrei Lepikhov <lepihov@gmail.com>

To: Frédéric Yhuel <frederic.yhuel@dalibo.com>, Tom Lane <tgl@sss.pgh.pa.us>

Cc: "pgsql-performance@lists.postgresql.org" <pgsql-performance@lists.postgresql.org>, Jehan-Guillaume de Rorthais <jgdr@dalibo.com>, Christophe Courtois <christophe.courtois@dalibo.com>, Laurenz Albe <laurenz.albe@cybertec.at>

Date: 2025-09-23T10:43:32Z

Lists: pgsql-performance

On 23/9/2025 12:20, Frédéric Yhuel wrote:
> On 9/22/25 23:15, Andrei Lepikhov wrote:
>> It may solve at least one issue with the 'dependencies' statistics: a 
>> single number describing the dependency between any two values in the 
>> columns often leads to incorrect estimations, as I see.
> 
> For what it's worth, I've never encountered a case in my life as a 
> PostgreSQL support engineer where the 'dependency' kind could be useful. 
> I only successfully used the 'mcv' kind once (and that was only 
> partially successful, as it fixed the estimates but not the plan).Thanks for your feedback!
I also don't think the 'dependencies' statistics are very useful now, 
especially considering how many computational resources it is needed in 
case of multiple columns involved.
But is it the same for the 'distinct' statistics? It seems you should 
love it - the number of groups in GROUP-BY, DISTINCT, and even HashJoin 
should be estimated more precisely, no?

-- 
regards, Andrei Lepikhov