Thread

  1. Re: using extended statistics to improve join estimates

    Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> — 2025-06-09T18:38:27Z

    Hi hackers
    
    Еhank you for your work.
    
    Let me start my review from the top — specifically, in clausesel.c, the 
    function clauselist_selectivity_ext():
    
    1. About check clauses == NULL. In my opinion, this check should be 
    kept. This issue has already been discussed previously[0], and I think 
    it's better to keep the safety check.
    
    2. I noticed that the patch applies extended statistics to OR clauses as 
    well. There's an example from regression tests illustrating this:
    
    Before applying ext stats:
    SELECT * FROM check_estimated_rows('select * from join_test_1 j1 join 
    join_test_2 j2 on ((j1.a + 1 = j2.a + 1) or (j1.b = j2.b))');
      estimated | actual
    -----------+--------
         104500 | 100000
    
    After applying ext stats:
    SELECT * FROM check_estimated_rows('select * from join_test_1 j1 join 
    join_test_2 j2 on ((j1.a + 1 = j2.a + 1) or (j1.b = j2.b))');
      estimated | actual
    -----------+--------
         190000 | 100000
    (1 row)
    
    I agree that, at least for now, we should focus solely on AND clauses. 
    To do that, we should impose the same restriction in 
    clauselist_selectivity_or() as we already do in 
    clauselist_selectivity_ext().
    
    What do you think? Or shall we consider OR-clauses as well?
    
    [0]: 
    https://www.postgresql.org/message-id/flat/016e33b7-2830-4300-bc89-e7ce9e613bad%40tantorlabs.com
    
    --
    Best regards,
    Ilia Evdokimov,
    Tantor Labs LLC.