Thread

  1. Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update

    amit <amitlangote09@gmail.com> — 2025-12-24T08:08:33Z

    Hi,
    
    On Tue, Dec 23, 2025 at 4:07 Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
    
    > On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote:
    > >
    > > The issue is that the MERGE INTO match condition is not updated.
    > > In the MATCHED path of MERGE INTO, when the target row satisfies the
    > match condition and the condition itself has not changed, the system should
    > still be able to handle concurrent updates to the same target row by
    > relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple,
    > and then proceed with the intended update.
    > > However, in the current implementation, even though the concurrent
    > update does not modify any columns relevant to the ON condition, the EPQ
    > recheck unexpectedly results in a match condition failure, causing the
    > update path that should remain MATCHED to be treated as NOT MATCHED.
    >
    > I spent a little time looking at this, and managed to reduce the
    > reproducer test case down to this:
    >
    > -- Setup
    > drop table if exists t1,t2;
    > create table t1(a int primary key, b int);
    > create table t2(a int, b int);
    >
    > insert into t1 values(1,0),(2,0);
    > insert into t2 values(1,1),(2,2);
    >
    > -- Session 1
    > begin;
    > update t1 set b = b+1;
    >
    > -- Session 2
    > merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a)
    > when matched then
    >       update set b = t1.b + 1
    > when not matched then
    >       insert (a,b) values (1,1);
    >
    > -- Session 1
    > commit;
    >
    > This works fine in PG17, but fails with a PK violation in PG18.
    > Git-bisecting points to this commit:
    >
    > cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit
    > commit cbc127917e04a978a788b8bc9d35a70244396d5b
    > Author: Amit Langote <amitlan@postgresql.org>
    > Date:   Fri Feb 7 17:15:09 2025 +0900
    >
    >     Track unpruned relids to avoid processing pruned relations
    >
    > Doing a little more debugging, it looks like the problem might be this
    > change in InitPlan():
    >
    > -           /* ignore "parent" rowmarks; they are irrelevant at runtime */
    > -           if (rc->isParent)
    > +           /*
    > +            * Ignore "parent" rowmarks, because they are irrelevant at
    > +            * runtime.  Also ignore the rowmarks belonging to child tables
    > +            * that have been pruned in ExecDoInitialPruning().
    > +            */
    > +           if (rc->isParent ||
    > +               !bms_is_member(rc->rti, estate->es_unpruned_relids))
    >                 continue;
    >
    > which seems to cause it to incorrectly skip a rowmark, which I suspect
    > is what is causing EvalPlanQual() to return the wrong result.
    
    
    Thanks for the detailed analysis and adding me to the thread, Dean.
    
    I would think that a case that involves no partitioning at all would be
    untouchable by this code, but it looks like the logic I added is
    incorrectly affecting cases where pruning isn’t even relevant. I’ll need to
    look more carefully at why such a rowmark would exist in the rowmarks list
    if its relation isn’t in es_unpruned_relids. Maybe the set population is
    incorrect at some point, or perhaps it matters that the set is a copy in
    the EPQ estate.
    
    I’m afk (on vacation) at the moment, so won’t be able to dig into this
    until next week.
    
    — Amit
    
    >