Thread
-
Re: Fixing some ancient errors in hash join costing
Chao Li <li.evan.chao@gmail.com> — 2025-12-29T00:39:38Z
> On Dec 29, 2025, at 06:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: >> I was amused to notice that the postgres_fdw.out change made in my >> patch reverts one made in aa86129e1 (which also affected semijoin >> costing). So we've had trouble before with that test case being >> fundamentally unstable. I wonder if we shouldn't do something to try >> to stabilize it? I see that the test immediately before this one >> forces the matter by turning off enable_sort (which'd affect only >> the local side not the remote). That's a hack all right but maybe >> we should extend it to this test. > > Here's a v2 patchset that splits that out as a separate change for > clarity's sake. I also spent a bit of effort on commit log messages, > including researching the git history. > > regards, tom lane > Just a few small comments on v2: 1 - 0001 ``` -SET enable_hashjoin TO off; +-- These next tests require choosing between remote and local sort, which is +-- a coin flip so long as cost_sort() gives the same results on both sides. +-- To stabilize the expected plans, disable sorting locally. SET enable_sort TO off; -- subquery using stable function (can't be sent to remote) +SET enable_hashjoin TO off; -- this one needs even more help to be stable ``` This is not a concern, just curious why switch the setting order of enable_hashjoin and enable_sort? 2 - 0002 ``` + else if (get_attstatsslot(&sslot, vardata.statsTuple, + STATISTIC_KIND_HISTOGRAM, InvalidOid, + 0)) + { + /* + * If there are no recorded MCVs, but we do have a histogram, then + * assume that ANALYZE determined that the column is unique. + */ + if (vardata.rel && vardata.rel->rows > 0) + *mcv_freq = 1.0 / vardata.rel->rows; + free_attstatsslot(&sslot); + } ``` As flag 0 is passed to get_attstatsslot(), free_attstatsslot() is not needed. The function comment of get_attstatsslot states about that: ``` * Passing flags=0 can be useful to quickly check if the requested slot type * exists. In this case no arrays are extracted, so free_attstatsslot need * not be called. ``` 3 - In funciton estimate_hash_bucket_stats(), when mcv_freq is initialized: ``` /* Look up the frequency of the most common value, if available */ *mcv_freq = 0.0; ``` Maybe worth adding a short comment like “0.0 doesn’t mean zero frequency, instead 0.0 means no data or unknown frequency”, which might help code readers to quickly understand the logic. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/