Thread

  1. Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

    Etsuro Fujita <etsuro.fujita@gmail.com> — 2025-08-22T10:59:40Z

    On Wed, Aug 6, 2025 at 8:30 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
    > On Wed, Aug 6, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
    > > The attached patch includes the draft fix and regression tests (using
    > > injection points).
    > >
    > > I don't have enough experience with the planner and FDW code area to
    > > evaluate whether the patch fixes the issue in the right approach.
    > > Feedback is very welcome. I've confirmed this assertion could happen
    > > with the same scenario on all supported branches.
    >
    > Will review.  Thank you for the report and patch!
    
    First, my apologies for the delay.
    
    I reviewed the postgres_fdw.c part of the fix:
    
    @@ -6313,9 +6314,18 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
         * calling foreign_join_ok(), since that function updates fpinfo and marks
         * it as pushable if the join is found to be pushable.
         */
    -   if (root->parse->commandType == CMD_DELETE ||
    -       root->parse->commandType == CMD_UPDATE ||
    -       root->rowMarks)
    +   for (PlannerInfo *proot = root; proot != NULL; proot = proot->parent_root)
    +   {
    +       if (proot->parse->commandType == CMD_DELETE ||
    +           proot->parse->commandType == CMD_UPDATE ||
    +           proot->rowMarks)
    +       {
    +           need_epq = true;
    +           break;
    +       }
    +   }
    +
    +   if (need_epq)
        {
            epq_path = GetExistingLocalJoinPath(joinrel);
            if (!epq_path)
    
    I think this successfully avoids the assertion failure and produces
    the correct result, but sorry, I don't think it's the right way to go.
    I think the root cause of this issue is in the EPQ handling of
    foreign/custom joins in ExecScanFetch() in execScan.h: it runs the
    recheck method function for a given foreign/custom join whenever it is
    called for EPQ rechecking, but that is not 100% correct.  I think the
    correct handling is: run the recheck method function for the join if
    it is called for EPQ rechecking and the join is a *descendant* join in
    the EPQ plan tree; otherwise run the access method function for the
    join even if it is called for EPQ rechecking, like the attached (where
    I used the epqParam of the given EPQState to determine whether the
    join is a descendant join or not, which localizes the fix pretty
    well).  For the SELECT FOR UPDATE query shown upthread, when doing an
    EPQ recheck, the fix evaluates the sub-select expression in the target
    list by doing a remote join, not a local join, so it would work more
    efficiently than the fix you proposed.
    
    Sorry, I have not yet reviewed the isolation test part.  Will do next.
    
    Best regards,
    Etsuro Fujita