Thread

  1. Re: Asynchronous MergeAppend

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-17T21:09:19Z

    On Sat Nov 15, 2025 at 7:57 AM -03, Alexander Pyhalov wrote:
    >> Here are some comments on my first look at the patches. I still don't
    >> have too much experience with the executor code but I hope that I can
    >> help with something.
    >>
    >> v5-0001-mark_async_capable-subpath-should-match-subplan.patch
    >>
    >> I don't have to much comments on this, perhaps we could have a commit
    >> message explaining the reason behind the change.
    >
    > I've expanded commit message. The issue is that mark_async_capable()
    > relies
    > on the fact that plan node type corresponds to path type. This is not
    > true when
    > (Merge)Append decides to insert Sort node.
    >
    Your explanation about why this change is needed that you've include on
    your first email sounds more clear IMHO. I would suggest the following
    for a commit message:
        mark_async_capable() believes that path corresponds to plan. This is
        not true when create_[merge_]append_plan() inserts sort node. In
        this case mark_async_capable() can treat Sort plan node as some
        other and crash. Fix this by handling the Sort node separately.
    
        This is needed to make MergeAppend node async-capable that it will
        be implemented in a next commit.
    
    What do you think?
    
    I was reading the patch changes again and I have a minor point:
    
                                    SubqueryScan *scan_plan = (SubqueryScan *) plan;
    
                                    /*
    -                                * If the generated plan node includes
    a gating Result node,
    -                                * we can't execute it asynchronously.
    +                                * If the generated plan node includes
    a gating Result node or
    +                                * a Sort node, we can't execute it
    asynchronously.
                                     */
    -                               if (IsA(plan, Result))
    +                               if (IsA(plan, Result) || IsA(plan, Sort))
    
    Shouldn't we cast the plan to SubqueryScan* after the IsA(...) check? I
    know this code has been before your changes but type casting before a
    IsA() check sounds a bit strange to me. Also perhaps we could add an
    Assert(IsA(plan, SubqueryScan)) after the IsA(...) check and before the
    type casting just for sanity?
    
    >> ----
    >>
    >> v5-0002-MergeAppend-should-support-Async-Foreign-Scan-sub.patch
    >>
    >> The AppendState struct has the "as_syncdone", this field is not needed
    >> on MergeAppendState?
    >
    > We don't need as_syncdone. Async Append fetches tuple from async subplan
    > and waits for them either when they have some data or when there's no
    > more sync subplans (as we can return any tuple we receive from
    > subplans). But ExecMergeAppend should decide which tuple to return based
    > on sort order, so there's no need to remember if we are done with sync
    > subplans, as subplans ordering matters, and we can't arbitrary switch
    > between them.
    >
    Ok, thanks for the explanation.
    
    >
    >> Regarding the duplicated code on classify_matching_subplans I think
    >> that
    >> we can have a more generic function that operates over function
    >> parameters
    >
    > I've tried to do so, but there are two issues. There's no suitable
    > common header between
    > nodAppend and nodeMergeAppend. I've put
    > classify_matching_subplans_common() into src/include/nodes/execnodes.h
    > and sure that it's not the best choice. The second issue is with
    > as_syncdone, it exists only in AppendState, so
    > we should check for empty valid_subplans separately. In fact, there's 3
    > outcomes for Append 1) no sync plans,
    > no async plans,  2) no async plans, 3) async plans present, and only
    > last two states have meaning
    > for MergeAppend.
    >
    I think that's ok to have these separated checks on nodeAppend.c and
    nodeMergeAppend.c once the majority of duplicated steps that would be
    required is centralized into a single reusable function.
    
    I also agree that execnodes.h may not be the best place to declare this
    function but I also don't have too many ideas of where to put it. Let's
    see if we have more comments on this.
    
    >> ----
    >> We have some reduction of code coverage on nodeMergeAppend.c. The
    >> significant blocks are on ExecMergeAppendAsyncBegin():
    >> +    /* If we've yet to determine the valid subplans then do so now. */
    >> +    if (!node->ms_valid_subplans_identified)
    >> +    {
    >> +            node->ms_valid_subplans =
    >> +                    ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
    >> +            node->ms_valid_subplans_identified = true;
    >> +
    >> +            classify_matching_subplans(node);
    >> +    }
    >>
    >> And there are some blocks on ExecReScanMergeAppend(). It's worth adding
    >> a test case for then? I'm not sure how hard would be to write a
    >> regression test that cover these blocks.
    >>
    >
    > You are right. There's difference between ExecAppend and
    > ExecMergeAppend. Append identifies valid subplans in
    > ExecAppendAsyncBegin. MergeAppend - earlier, in ExecMergeAppend(). So
    > this is really the dead code. And there was an issue with it, which
    > became evident when I've added test for rescan. When we've identified
    > new subplans in ExecMergeAppend(), we have to classify them.
    >
    Thanks, the code coverage looks better now.
    
    I plan to do another round of review on 0002, in the meantime I'm
    sharing these comments for now.
    
    --
    Matheus Alcantara
    EDB: http://www.enterprisedb.com