Thread

  1. Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update

    Tender Wang <tndrwang@gmail.com> — 2025-12-24T12:07:00Z

    Amit Langote <amitlangote09@gmail.com> 于2025年12月24日周三 16:08写道:
    
    > Hi,
    >
    > On Tue, Dec 23, 2025 at 4:07 Dean Rasheed <dean.a.rasheed@gmail.com>
    > wrote:
    >
    >> On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote:
    >> >
    >> > The issue is that the MERGE INTO match condition is not updated.
    >> > In the MATCHED path of MERGE INTO, when the target row satisfies the
    >> match condition and the condition itself has not changed, the system should
    >> still be able to handle concurrent updates to the same target row by
    >> relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple,
    >> and then proceed with the intended update.
    >> > However, in the current implementation, even though the concurrent
    >> update does not modify any columns relevant to the ON condition, the EPQ
    >> recheck unexpectedly results in a match condition failure, causing the
    >> update path that should remain MATCHED to be treated as NOT MATCHED.
    >>
    >> I spent a little time looking at this, and managed to reduce the
    >> reproducer test case down to this:
    >>
    >> -- Setup
    >> drop table if exists t1,t2;
    >> create table t1(a int primary key, b int);
    >> create table t2(a int, b int);
    >>
    >> insert into t1 values(1,0),(2,0);
    >> insert into t2 values(1,1),(2,2);
    >>
    >> -- Session 1
    >> begin;
    >> update t1 set b = b+1;
    >>
    >> -- Session 2
    >> merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a)
    >> when matched then
    >>       update set b = t1.b + 1
    >> when not matched then
    >>       insert (a,b) values (1,1);
    >>
    >> -- Session 1
    >> commit;
    >>
    >> This works fine in PG17, but fails with a PK violation in PG18.
    >> Git-bisecting points to this commit:
    >>
    >> cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit
    >> commit cbc127917e04a978a788b8bc9d35a70244396d5b
    >> Author: Amit Langote <amitlan@postgresql.org>
    >> Date:   Fri Feb 7 17:15:09 2025 +0900
    >>
    >>     Track unpruned relids to avoid processing pruned relations
    >>
    >> Doing a little more debugging, it looks like the problem might be this
    >> change in InitPlan():
    >>
    >> -           /* ignore "parent" rowmarks; they are irrelevant at runtime */
    >> -           if (rc->isParent)
    >> +           /*
    >> +            * Ignore "parent" rowmarks, because they are irrelevant at
    >> +            * runtime.  Also ignore the rowmarks belonging to child
    >> tables
    >> +            * that have been pruned in ExecDoInitialPruning().
    >> +            */
    >> +           if (rc->isParent ||
    >> +               !bms_is_member(rc->rti, estate->es_unpruned_relids))
    >>                 continue;
    >>
    >> which seems to cause it to incorrectly skip a rowmark, which I suspect
    >> is what is causing EvalPlanQual() to return the wrong result.
    >
    >
    > Thanks for the detailed analysis and adding me to the thread, Dean.
    >
    > I would think that a case that involves no partitioning at all would be
    > untouchable by this code, but it looks like the logic I added is
    > incorrectly affecting cases where pruning isn’t even relevant. I’ll need to
    > look more carefully at why such a rowmark would exist in the rowmarks list
    > if its relation isn’t in es_unpruned_relids. Maybe the set population is
    > incorrect at some point, or perhaps it matters that the set is a copy in
    > the EPQ estate.
    >
    
    I did some debugging, and I found that:
    In  add_rte_to_flat_rtable(),  the RTE of value was not added into
    glob->AllRelids, because below codes:
    .....
    if (newrte->rtekind == RTE_RELATION ||
    (newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
    {
         glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
         glob->allRelids = bms_add_member(glob->allRelids,
             list_length(glob->finalrtable));
    }
    ....
    
    The VALUE rte was not satisfied above if, so it was not added into the
    glob->allRelids.
    Then in standard_planner(), we have:
    ....
    result->unprunableRelids = bms_difference(glob->allRelids,
     glob->prunableRelids);
    ....
    So the result->unprunableRelids contains only 1.
    
    In InitPlan(), we have:
    .....
    /*
    * Ignore "parent" rowmarks, because they are irrelevant at
    * runtime.  Also ignore the rowmarks belonging to child tables
    * that have been pruned in ExecDoInitialPruning().
    */
    if (rc->isParent ||
       !bms_is_member(rc->rti, estate->es_unpruned_relids))
        continue;
    .....
    the estate->es_unpruned_relids equals with result->unprunableRelids
    contains. So the rowMark was skipped incorrectly.
    
    I did a quick fix as the attached patch.
    Any thoughts?
    
    -- 
    Thanks,
    Tender Wang