Thread
-
Re: Asynchronous MergeAppend
Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-17T21:09:19Z
On Sat Nov 15, 2025 at 7:57 AM -03, Alexander Pyhalov wrote: >> Here are some comments on my first look at the patches. I still don't >> have too much experience with the executor code but I hope that I can >> help with something. >> >> v5-0001-mark_async_capable-subpath-should-match-subplan.patch >> >> I don't have to much comments on this, perhaps we could have a commit >> message explaining the reason behind the change. > > I've expanded commit message. The issue is that mark_async_capable() > relies > on the fact that plan node type corresponds to path type. This is not > true when > (Merge)Append decides to insert Sort node. > Your explanation about why this change is needed that you've include on your first email sounds more clear IMHO. I would suggest the following for a commit message: mark_async_capable() believes that path corresponds to plan. This is not true when create_[merge_]append_plan() inserts sort node. In this case mark_async_capable() can treat Sort plan node as some other and crash. Fix this by handling the Sort node separately. This is needed to make MergeAppend node async-capable that it will be implemented in a next commit. What do you think? I was reading the patch changes again and I have a minor point: SubqueryScan *scan_plan = (SubqueryScan *) plan; /* - * If the generated plan node includes a gating Result node, - * we can't execute it asynchronously. + * If the generated plan node includes a gating Result node or + * a Sort node, we can't execute it asynchronously. */ - if (IsA(plan, Result)) + if (IsA(plan, Result) || IsA(plan, Sort)) Shouldn't we cast the plan to SubqueryScan* after the IsA(...) check? I know this code has been before your changes but type casting before a IsA() check sounds a bit strange to me. Also perhaps we could add an Assert(IsA(plan, SubqueryScan)) after the IsA(...) check and before the type casting just for sanity? >> ---- >> >> v5-0002-MergeAppend-should-support-Async-Foreign-Scan-sub.patch >> >> The AppendState struct has the "as_syncdone", this field is not needed >> on MergeAppendState? > > We don't need as_syncdone. Async Append fetches tuple from async subplan > and waits for them either when they have some data or when there's no > more sync subplans (as we can return any tuple we receive from > subplans). But ExecMergeAppend should decide which tuple to return based > on sort order, so there's no need to remember if we are done with sync > subplans, as subplans ordering matters, and we can't arbitrary switch > between them. > Ok, thanks for the explanation. > >> Regarding the duplicated code on classify_matching_subplans I think >> that >> we can have a more generic function that operates over function >> parameters > > I've tried to do so, but there are two issues. There's no suitable > common header between > nodAppend and nodeMergeAppend. I've put > classify_matching_subplans_common() into src/include/nodes/execnodes.h > and sure that it's not the best choice. The second issue is with > as_syncdone, it exists only in AppendState, so > we should check for empty valid_subplans separately. In fact, there's 3 > outcomes for Append 1) no sync plans, > no async plans, 2) no async plans, 3) async plans present, and only > last two states have meaning > for MergeAppend. > I think that's ok to have these separated checks on nodeAppend.c and nodeMergeAppend.c once the majority of duplicated steps that would be required is centralized into a single reusable function. I also agree that execnodes.h may not be the best place to declare this function but I also don't have too many ideas of where to put it. Let's see if we have more comments on this. >> ---- >> We have some reduction of code coverage on nodeMergeAppend.c. The >> significant blocks are on ExecMergeAppendAsyncBegin(): >> + /* If we've yet to determine the valid subplans then do so now. */ >> + if (!node->ms_valid_subplans_identified) >> + { >> + node->ms_valid_subplans = >> + ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL); >> + node->ms_valid_subplans_identified = true; >> + >> + classify_matching_subplans(node); >> + } >> >> And there are some blocks on ExecReScanMergeAppend(). It's worth adding >> a test case for then? I'm not sure how hard would be to write a >> regression test that cover these blocks. >> > > You are right. There's difference between ExecAppend and > ExecMergeAppend. Append identifies valid subplans in > ExecAppendAsyncBegin. MergeAppend - earlier, in ExecMergeAppend(). So > this is really the dead code. And there was an issue with it, which > became evident when I've added test for rescan. When we've identified > new subplans in ExecMergeAppend(), we have to classify them. > Thanks, the code coverage looks better now. I plan to do another round of review on 0002, in the meantime I'm sharing these comments for now. -- Matheus Alcantara EDB: http://www.enterprisedb.com