Thread

  1. Re: Support tid range scan in parallel?

    David Rowley <dgrowleyml@gmail.com> — 2025-11-07T05:31:20Z

    On Sat, 30 Aug 2025 at 05:32, Cary Huang <cary.huang@highgo.ca> wrote:
    >
    >  > The workers are ending their scan early because
    >  > heap_getnextslot_tidrange() returns false on the first call from the
    >  > parallel worker.
    
    > Yes, the previous v9 patch missed setting node->trss_mintid and
    > node->trss_maxtid, causing the parallel workers to exit early due to
    > heap_getnextslot_tidrange() returning false.
    >
    > With the attached v10 patch, the parallel leader and workers now
    > have to evaluate (TidRangeEval()) the given TID ranges and set them
    > via ExecTidRangeScanInitializeDSM(),
    > ExecTidRangeScanReInitializeDSM() and
    > ExecTidRangeScanInitializeWorker().
    >
    > To prevent the leader and the workers from calling heap_setscanlimits()
    > and trying to set phs_startblock and phs_numblock concurrently in
    > shared memory, I added a condition to only allow parallel
    > leader to set them. Since node->trss_mintid and node->trss_maxtid
    > reside in local memory, the workers still have to call heap_set_tidrange
    > () to have them set to return correct scan results:
    
    I spent quite a bit of time looking at this. I didn't like the way
    heap_setscanlimits() did:
    
    + /* set the limits in the ParallelBlockTableScanDesc, when present as leader */
    + if (scan->rs_base.rs_parallel != NULL && !IsParallelWorker())
    
    as it wasn't clear to me that this didn't break completely when the
    leader didn't make it there first.
    
    I've made quite a few revisions to the v10 patch, which I've attached
    as 11-0002. v11-0001 your v10 rebased atop of master.
    
    Here's a summary of the changes:
    
    1. Moved block limiting logic for parallel scans into
    table_block_parallelscan_startblock_init(). There's currently a lock
    here to ensure only 1 worker can set the shared memory fields at a
    time. I've hooked into the same lock to set the startblock and
    numblocks.
    2. Fixed chunk size ramp-down code which is meant to divvy up the scan
    into smaller and smaller chunks as it nears completion so that one
    worker doesn't get left with too much work and leave the others with
    nothing.  That code still thought that it was scanning every block in
    the table.
    3. Changed things around so that the min/max TID for the parallel scan
    is specified via table_rescan_tidrange(). This means zero changes to
    TidRangeNext, and the only additions to nodeTidrangescan.c are for the
    shared memory handling.
    4. The rest of the changes are mostly cosmetic.
    
    With this version table_block_parallelscan_startblock_init() has grown
    2 extra fields. I considered we should instead rename this function
    append "_with_limit" to its name then add another function with the
    original name that calls the renamed function passing in
    InvalidBlockNumber for both new fields. I didn't do that as we only
    have a single call to the existing function, so doing that would only
    be for the benefit of extensions that happen to use that function. It
    doesn't seem overly difficult for them to adjust their code. I didn't
    find any code using that function in codesearch.debian.net.
    
    I still need to do a bit more testing on this, but in the meantime
    thought I'd share what I've done with it so that other people can look
    in parallel.
    
    David