Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c

Etsuro Fujita <etsuro.fujita@gmail.com>

From: Etsuro Fujita <etsuro.fujita@gmail.com>
To: Masahiko Sawada <sawada.mshk@gmail.com>
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>, kristianlejao@gmail.com
Date: 2025-08-22T10:59:40Z
Lists: pgsql-bugs

Attachments

On Wed, Aug 6, 2025 at 8:30 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Wed, Aug 6, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > The attached patch includes the draft fix and regression tests (using
> > injection points).
> >
> > I don't have enough experience with the planner and FDW code area to
> > evaluate whether the patch fixes the issue in the right approach.
> > Feedback is very welcome. I've confirmed this assertion could happen
> > with the same scenario on all supported branches.
>
> Will review.  Thank you for the report and patch!

First, my apologies for the delay.

I reviewed the postgres_fdw.c part of the fix:

@@ -6313,9 +6314,18 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
     * calling foreign_join_ok(), since that function updates fpinfo and marks
     * it as pushable if the join is found to be pushable.
     */
-   if (root->parse->commandType == CMD_DELETE ||
-       root->parse->commandType == CMD_UPDATE ||
-       root->rowMarks)
+   for (PlannerInfo *proot = root; proot != NULL; proot = proot->parent_root)
+   {
+       if (proot->parse->commandType == CMD_DELETE ||
+           proot->parse->commandType == CMD_UPDATE ||
+           proot->rowMarks)
+       {
+           need_epq = true;
+           break;
+       }
+   }
+
+   if (need_epq)
    {
        epq_path = GetExistingLocalJoinPath(joinrel);
        if (!epq_path)

I think this successfully avoids the assertion failure and produces
the correct result, but sorry, I don't think it's the right way to go.
I think the root cause of this issue is in the EPQ handling of
foreign/custom joins in ExecScanFetch() in execScan.h: it runs the
recheck method function for a given foreign/custom join whenever it is
called for EPQ rechecking, but that is not 100% correct.  I think the
correct handling is: run the recheck method function for the join if
it is called for EPQ rechecking and the join is a *descendant* join in
the EPQ plan tree; otherwise run the access method function for the
join even if it is called for EPQ rechecking, like the attached (where
I used the epqParam of the given EPQState to determine whether the
join is a descendant join or not, which localizes the fix pretty
well).  For the SELECT FOR UPDATE query shown upthread, when doing an
EPQ recheck, the fix evaluates the sub-select expression in the target
list by doing a remote join, not a local join, so it would work more
efficiently than the fix you proposed.

Sorry, I have not yet reviewed the isolation test part.  Will do next.

Best regards,
Etsuro Fujita