Thread

  1. Re: Remove traces of long in dynahash.c

    Dean Rasheed <dean.a.rasheed@gmail.com> — 2025-09-09T09:28:13Z

    On Tue, 9 Sept 2025 at 04:58, Michael Paquier <michael@paquier.xyz> wrote:
    >
    > On Fri, Sep 05, 2025 at 10:40:46AM +0100, Dean Rasheed wrote:
    > > Alternatively, why not just impose the upper bound at the call sites
    > > if needed, so there may be no need for these functions at all. For
    > > example, looking at nodeHash.c, it would seem much more logical to
    > > have ExecChooseHashTableSize() put an upper bound on nbuckets, rather
    > > than capping log2_nbuckets after nbuckets has been chosen, risking
    > > them getting out-of-sync.
    >
    > Yep, that may be the best course of action.  As far as I can see, this
    > is capped by palloc() and HashJoinTuple, so we should be OK with
    > putting a hard limit at (INT_MAX / 2) and call it a day, I guess?
    
    +1
    
    In ExecChooseHashTableSize():
    
    +    /*
    +     * Cap nbuckets, for power of 2 calculations.  This maximum is safe
    +     * for pg_ceil_log2_32().
    +     */
    +    if (nbuckets > PG_INT32_MAX / 2)
    +        nbuckets = PG_INT32_MAX / 2;
    
    That comment is not really right, because pg_ceil_log2_32() can accept
    larger inputs than that. But in case, that cap is wrong because
    nbuckets should always be a power of 2, and PG_INT32_MAX / 2 is 2^30 -
    1.
    
    Looking more closely, I don't think a cap is needed at all, given the
    prior computations. Further up, there's this code:
    
        /* Also ensure we avoid integer overflow in nbatch and nbuckets */
        /* (this step is redundant given the current value of MaxAllocSize) */
        max_pointers = Min(max_pointers, INT_MAX / 2 + 1);
    
    and in practice, it's constrained to be much less than that, based on
    this earlier code:
    
        max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinTuple));
    
    So I think in theory that ensures that nbuckets can never get anywhere
    near overflowing a 32-bit integer. Given that nbuckets is a 32-bit
    signed integer, and it is a power of 2, it is automatically less than
    or equal to 2^30, or else if somehow there was an error in the
    preceding logic and an attempt had been made to make it larger than
    that, integer wrap-around would have occurred (e.g., it might have
    become -2^31), in which case the "Assert(nbuckets > 0)" would trap it.
    
    So I think there's no point in adding that cap, or any additional
    checks in ExecChooseHashTableSize().
    
    Regards,
    Dean