Thread

  1. Re: Fixing some ancient errors in hash join costing

    Chao Li <li.evan.chao@gmail.com> — 2025-12-29T00:39:38Z

    
    > On Dec 29, 2025, at 06:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > 
    > I wrote:
    >> I was amused to notice that the postgres_fdw.out change made in my
    >> patch reverts one made in aa86129e1 (which also affected semijoin
    >> costing).  So we've had trouble before with that test case being
    >> fundamentally unstable.  I wonder if we shouldn't do something to try
    >> to stabilize it?  I see that the test immediately before this one
    >> forces the matter by turning off enable_sort (which'd affect only
    >> the local side not the remote).  That's a hack all right but maybe
    >> we should extend it to this test.
    > 
    > Here's a v2 patchset that splits that out as a separate change for
    > clarity's sake.  I also spent a bit of effort on commit log messages,
    > including researching the git history.
    > 
    > regards, tom lane
    > 
    
    Just a few small comments on v2:
    
    1 - 0001
    ```
    -SET enable_hashjoin TO off;
    +-- These next tests require choosing between remote and local sort, which is
    +-- a coin flip so long as cost_sort() gives the same results on both sides.
    +-- To stabilize the expected plans, disable sorting locally.
    SET enable_sort TO off;
    -- subquery using stable function (can't be sent to remote)
    +SET enable_hashjoin TO off;  -- this one needs even more help to be stable
    ```
    
    This is not a concern, just curious why switch the setting order of enable_hashjoin and enable_sort?
    
    2 - 0002
    ```
    +		else if (get_attstatsslot(&sslot, vardata.statsTuple,
    +								  STATISTIC_KIND_HISTOGRAM, InvalidOid,
    +								  0))
    +		{
    +			/*
    +			 * If there are no recorded MCVs, but we do have a histogram, then
    +			 * assume that ANALYZE determined that the column is unique.
    +			 */
    +			if (vardata.rel && vardata.rel->rows > 0)
    +				*mcv_freq = 1.0 / vardata.rel->rows;
    +			free_attstatsslot(&sslot);
    +		}
    ```
    
    As flag 0 is passed to get_attstatsslot(), free_attstatsslot() is not needed. The function comment of get_attstatsslot states about that:
    ```
    * Passing flags=0 can be useful to quickly check if the requested slot type
    * exists. In this case no arrays are extracted, so free_attstatsslot need
    * not be called.
    ```
    
    3 - In funciton estimate_hash_bucket_stats(), when mcv_freq is initialized:
    ```
    /* Look up the frequency of the most common value, if available */
    *mcv_freq = 0.0;
    ```
    
    Maybe worth adding a short comment like “0.0 doesn’t mean zero frequency, instead 0.0 means no data or unknown frequency”, which might help code readers to quickly understand the logic.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/