Thread

  1. Re: Asynchronous MergeAppend

    Alexander Pyhalov <a.pyhalov@postgrespro.ru> — 2025-11-20T14:22:32Z

    Matheus Alcantara писал(а) 2025-11-20 00:51:
    > On Tue Nov 18, 2025 at 4:14 AM -03, Alexander Pyhalov wrote:
    >> Updated the first patch.
    >> 
    > Thanks for the new version. Some new comments.
    > 
    > v7-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch:
    > 
    > 1. Should be "nasyncplans" instead of "nplans"? ExecInitAppend use
    > "nasyncplans" to allocate the as_asyncresults array.
    > 
    > +		mergestate->ms_asyncresults = (TupleTableSlot **)
    > +			palloc0(nplans * sizeof(TupleTableSlot *));
    > +
    > 
    
    No. There's a difference between how Append and MergeAppend handle async 
    results.
    
    When Append looks for the next result, it can return any of them.
    So, async results are not ordered in Append.
    We have maximum nasyncplans of async results and return the first 
    available result
    when we asked for one . For example, in ExecAppendAsyncRequest():
    
    1004         /*
    1005          * If there are any asynchronously-generated results that 
    have not yet
    1006          * been returned, we have nothing to do; just return one of 
    them.
    1007          */
    1008         if (node->as_nasyncresults > 0)
    1009         {
    1010                 --node->as_nasyncresults;
    1011                 *result = 
    node->as_asyncresults[node->as_nasyncresults];
    1012                 return true;
    1013         }
    
    ExecAppendAsyncGetNext() looks (via ExecAppendAsyncRequest()) on any 
    result.
    
    However, when we are asked for result in MergeAppend, we should return 
    result of
    the specific subplan. To achieve this we should know, which subplan 
    given results correspond to.
    So, we enumerate async results in the same way as requests (or 
    ms_valid_asyncplans).
    Look at ExecAppendAsyncGetNext()/ExecAppendAsyncRequest().
    
    > 2. I think that this comment should be updated. IIUC ms_valid_subplans
    > may not be all subplans because classify_matching_subplans() may move
    > async ones to ms_valid_asyncplans. Is that right?
    > 
    > /*
    >  * If we've yet to determine the valid subplans then do so now.  If
    >  * run-time pruning is disabled then the valid subplans will always be
    >  * set to all subplans.
    >  */
    > 
    
    Yes, you are correct, and similar comment in nodeAppend.c lacks the last 
    sentence.
    Removed it.
    
    > 3. This code comment should also mention the 
    > Assert(!bms_is_member(...));?
    > 
    > +	/* The result should be a TupleTableSlot or NULL. */
    > +	Assert(slot == NULL || IsA(slot, TupleTableSlot));
    > +	Assert(!bms_is_member(areq->request_index, 
    > node->ms_has_asyncresults));
    > 
    
    > 4. It's worth include bms_num_members(node->ms_needrequest) <= 0 check
    > on ExecMergeAppendAsyncRequest() as an early return? IIUC it would 
    > avoid
    > the bms_is_member(), bms_next_member() and bms_is_empty(needrequest)
    > calls.
    
    We can't exclude the first bms_is_member(), as node->ms_needrequest can 
    be empty
    (we've already got result), so do not need to do request to get it, just
    return previously fetched result.
    
    Not sure about check above the following lines:
    
    650         i = -1;
    651         while ((i = bms_next_member(node->ms_needrequest, i)) >= 0)
    652         {
    653                 if (!bms_is_member(i, node->ms_has_asyncresults))
    654                         needrequest = bms_add_member(needrequest, 
    i);
    655         }
    656
    
    I think, it shouldn't be much cheaper as bms_next_member() will execute 
    a couple instructions
    to find out that the number of words in bitmapset is zero, but will do 
    nothing expensive.
    
    > 
    > ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
    >         Bitmapset  *needrequest;
    >         int                     i;
    > 
    > +       if (bms_num_members(node->ms_needrequest) <= 0)
    > +               return false;
    > +
    > 
    
    No, as I've mentioned, we can't exclude bms_is_member(mplan, 
    node->ms_has_asyncresults) check.
    We could have received result while waiting for data for another 
    subplan.
    
    Let's assume, we have 2 async subplans (0 and 1). For example, we've 
    decided
    to get data from subplan 1. We 've already send requests to both async 
    subplans (in ExecMergeAppendAsyncBegin() or later).
    Now we do ExecMergeAppendAsyncRequest(), but there's no subplans, which 
    need request.
    So we enter the waiting loop. Let's assume we got event for another 
    subplan (0). Via ExecAsyncNotify(),
    ExecAsyncForeignScanNotify() we go to postgresForeignAsyncNotify(), 
    fetch data and mark request as complete.
    Via ExecAsyncMergeAppendResponse() we save results for subplan 0 in 
    node->ms_asyncresults[0]. When we finally
    got result for subplan 1, we do the same, but now exit the loop. When 
    MergeAppend finally decides that it needs
    results from subplan 0, we already have them, but ms_needrequest is 
    empty, so  ExecMergeAppendAsyncRequest()
    just returns this pre-fetched tuple.
    -- 
    Best regards,
    Alexander Pyhalov,
    Postgres Professional