Thread

  1. Re: generic plans and "initial" pruning

    amit <amitlangote09@gmail.com> — 2026-05-29T08:56:51Z

    On Thu, May 28, 2026 at 10:14 PM Thom Brown <thom@linux.com> wrote:
    > On Thu, 28 May 2026 at 09:14, Amit Langote <amitlangote09@gmail.com> wrote:
    > > It's a real bug.
    > >
    > > You're right that if PortalLockCachedPlan() replans, the QueryDesc
    > > created before the call still points at the old PlannedStmt from the
    > > released plan.  And yes, 0004 happens to fix it by rebuilding the
    > > QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
    > > broken on their own.
    > >
    > > Attached is an updated set with the fix: CreateQueryDesc now runs
    > > after PortalLockCachedPlan() returns, as you suggested.  That said,
    > > I'll probably focus first on settling the plancache refactoring that
    > > spun off from this thread [1], and then start a new thread for the
    > > pruning-aware locking work on top of it, incorporating parts of this
    > > series.
    >
    > Thanks.
    >
    > I've done another pass. I see a reference to
    > AcquireExecutorLocksUnpruned(), but I can't find this function. Is
    > this supposed to be AcquireExecutorLocksPrepared()?
    
    You're right, stale comment. It should say
    AcquireExecutorLocksPrepared(). Fixed.
    
    > And also I have a question about the new firstResultRels code
    >
    > If I've followed it right, the bit in setrefs.c records the
    > lowest-numbered RT index from leaf_result_relids as the
    > per-ModifyTable fallback that's used when all real targets get pruned
    > away, and the executor side looks it up via
    > linitial_int(node->resultRelations). For that to work those two have
    > to pick the same RT index, and the comment justifies it with
    > "partition expansion preserves RT index order". Where is that
    > preservation guaranteed?
    
    The ordering comes from expand_inherited_rtentry(), which adds child
    partitions to the range table sequentially in partition bound order.
    Since ModifyTable.resultRelations is built from the same expansion,
    its first element is the lowest-numbered RT index among the leaf
    partitions for that node. That is the same value
    bms_next_member(leaf_result_relids, -1) returns from the Bitmapset,
    because Bitmapset iteration returns members in ascending order. I've
    added a comment in setrefs.c pointing to expand_inherited_rtentry() as
    the source of this guarantee.
    
    > And with the assertion in ExecInitModifyTable:
    >
    > Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
    >
    > With writable CTEs producing more than one ModifyTable node the list
    > has several entries, so all the assert really checks is that some
    > recorded entry matches, not that the one recorded for this particular
    > node matches. If that's correct, then in a case where the wrong entry
    > happened to line up the right relation wouldn't be locked and nothing
    > would complain. Is there something that keeps these in order
    > somewhere?
    
    This is a fair observation -- the Assert checks membership in the
    global list rather than per-node correspondence. But node A's rti
    can't accidentally pass the Assert by matching an entry recorded for
    node B. Each ModifyTable node gets its own partition expansion with
    distinct RT entries. In a writable CTE like:
    
      WITH upd1 AS (UPDATE t SET ...),
           upd2 AS (UPDATE t SET ...)
      UPDATE t SET ...
    
    each UPDATE creates a separate set of leaf partition RT entries --
    upd1 might get RT indexes 5,6,7, upd2 gets 8,9,10, and the main UPDATE
    gets 11,12,13. The global firstResultRels list would be [5, 8, 11].
    When ExecInitModifyTable falls back to linitial_int(resultRelations)
    for a given node, it finds that node's own entry, because the RT index
    sets are disjoint across nodes.
    
    That said, it's worth being explicit about what protections exist at
    each layer, since this is safety-critical code:
    
    1. AcquireExecutorLocksPrepared(), added by 0004, locks every entry in
    firstResultRels unconditionally. So regardless of which rti a
    ModifyTable node falls back to, the relation will be locked.
    
    2. ExecGetRangeTableRelation() has two checks when opening a relation.
    For non-result relations (isResultRel=false), it checks
    es_unpruned_relids and raises an ERROR in release builds if the
    relation was pruned. For result relations (isResultRel=true), that
    check is intentionally skipped -- it has to be, because at least one
    result relation per ModifyTable node must remain openable even when
    all partitions are pruned, since executor code paths like ExecMerge()
    and ExecInitPartitionInfo() rely on resultRelInfo[0] being initialized
    (see commit 28317de723b). The remaining protection for result
    relations is Assert(CheckRelationLockedByMe()) inside table_open,
    which fires in debug builds.
    
    3. I've tightened ExecInitModifyTable to close this gap: the
    all-pruned fallback path now raises an elog(ERROR) in release builds
    if linitial_int(resultRelations) is not found in firstResultRels,
    rather than just an Assert. This gives result relations a
    production-visible check comparable to what es_unpruned_relids
    provides for scan relations.
    
    So the net effect is that for scan relations, opening a
    pruned-and-unlocked relation is caught by an ERROR in production via
    es_unpruned_relids. For result relations on the all-pruned fallback
    path, it's now also caught by an ERROR in production via the
    firstResultRels check in ExecInitModifyTable. The locking in
    AcquireExecutorLocksPrepared() ensures the relation is always locked
    regardless.
    
    Thanks again for the review.  A close look at these aspects by someone
    other than me is very useful.
    
    -- 
    Thanks, Amit Langote