Thread

  1. Re: Use merge-based matching for MCVs in eqjoinsel

    Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> — 2025-09-09T07:29:36Z

    Hi David,
    
    On 08.09.2025 17:36, David Geier wrote:
    > Do you think anything else needs changes in the patch?
    
    
     From an architectural perspective, I think the patch is already in good 
    shape. However, I have some suggestions regarding code style:
    
     1. I would move McvHashEntry, McvHashContext, he new hash table
        definition, hash_mcv and are_mcvs_equal to the top.
     2. I’m not sure get_hash_func_oid() is needed at all – it seems we
        could do without it.
     3. It would be better to name the parameters matchProductFrequencies ->
        matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1,
        hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().
     4. As I wrote earlier, since we now have a dedicated function
        eqjoinsel_inner_with_hashtable(), perhaps it could be used in both
        eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based
        search was factored out, maybe it would make sense to also factor
        out the O(N^2) nested loop implementation?
     5. I think it would be helpful to add a comment explaining that using a
        hash table is not efficient when the MCV array length equals 1:
    
    if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
         return false;
    
    > Did you have a
    > chance to check tables with just few MCVs or are there any queries in
    > the JOB which regress with very small default_statistics_target?
    
    
    Sure. I need some time to check this.
    
    -- 
    Best regards,
    Ilia Evdokimov,
    Tantor Labs LLC,
    https://tantorlabs.com