Thread

  1. Re: DSA overflow in hash join

    Konstantin Knizhnik <knizhnik@garret.ru> — 2025-07-31T15:13:56Z

    On 27/07/2025 8:24 PM, Konstantin Knizhnik wrote:
    >
    > I still trying to understand the reason of DSA overflow in hash join.
    > In addition to two suspicious places where number of buckets is 
    > doubled without chek for overflow (nodeHash.c:1668 and nodeHash.c:3290),
    > there is one  more place  where number of batches is multiplied by 
    > `EstimateParallelHashJoinBatch(hashtable)` which is
    >
    > sizeof(ParallelHashJoinBatch) + (sizeof(SharedTuplestore)  + 
    > sizeof(SharedTuplestoreParticipant) * participants) * 2
    >
    > which is 480 bytes!
    >
    > But when we calculate maximal number of batches, we limit it by 
    > macximal number of pointers (8 bytes):
    >
    >     max_pointers = hash_table_bytes / sizeof(HashJoinTuple);
    >     max_pointers = Min(max_pointers, MaxAllocSize / 
    > sizeof(HashJoinTuple));
    >     /* If max_pointers isn't a power of 2, must round it down to one */
    >     max_pointers = pg_prevpower2_size_t(max_pointers);
    >
    >     /* Also ensure we avoid integer overflow in nbatch and nbuckets */
    >     /* (this step is redundant given the current value of MaxAllocSize) */
    >     max_pointers = Min(max_pointers, INT_MAX / 2 + 1);
    >
    >     dbuckets = ceil(ntuples / NTUP_PER_BUCKET);
    >     dbuckets = Min(dbuckets, max_pointers);
    >     nbuckets = (int) dbuckets;
    >
    >
    > But as we see, here multiplier is 480 bytes, not 8 bytes.
    >
    
    Below is script to reproduce the problem:
    
    CREATE TABLE IF NOT EXISTS t0(c0 FLOAT, PRIMARY KEY(c0)) WITH 
    (parallel_workers=966);
    CREATE TABLE t2(c0 DECIMAL, c1 int4range ) WITH (parallel_workers=393);
    CREATE TABLE t4(LIKE t2);
    CREATE TABLE t5(LIKE t0);
    INSERT INTO t4(c0) VALUES(0.5934077416223362);
    
    set work_mem='10MB';
    set max_parallel_workers_per_gather=5;
    
    explain SELECT SUM(count) FROM (SELECT ALL CAST(FALSE AS INT) as count 
    FROM ONLY t5, ONLY t2 CROSS JOIN ONLY t0 LEFT OUTER JOIN t4* ON 
    (upper(((t2.c1)+(t2.c1))))::BOOLEAN CROSS JOIN (SELECT t4.c0 FROM ONLY 
    t0, t2*, t5*, t4* WHERE (((t2.c1)*(t2.c1))) IN (t4.c1)) AS sub0) as res;
    
    SELECT SUM(count) FROM (SELECT ALL CAST(FALSE AS INT) as count FROM ONLY 
    t5, ONLY t2 CROSS JOIN ONLY t0 LEFT OUTER JOIN t4* ON 
    (upper(((t2.c1)+(t2.c1))))::BOOLEAN CROSS JOIN (SELECT t4.c0 FROM ONLY 
    t0, t2*, t5*, t4* WHERE (((t2.c1)*(t2.c1))) IN (t4.c1)) AS sub0) as res;
    
    
    And attached please find patch fixing the issue.