Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update
amit <amitlangote09@gmail.com>
From: Amit Langote <amitlangote09@gmail.com>
To: Dean Rasheed <dean.a.rasheed@gmail.com>
Cc: Bh W <wangbihua.cn@gmail.com>, Laurenz Albe <laurenz.albe@cybertec.at>, pgsql-bugs@lists.postgresql.org
Date: 2025-12-24T08:08:33Z
Lists: pgsql-bugs
Hi, On Tue, Dec 23, 2025 at 4:07 Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote: > > > > The issue is that the MERGE INTO match condition is not updated. > > In the MATCHED path of MERGE INTO, when the target row satisfies the > match condition and the condition itself has not changed, the system should > still be able to handle concurrent updates to the same target row by > relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple, > and then proceed with the intended update. > > However, in the current implementation, even though the concurrent > update does not modify any columns relevant to the ON condition, the EPQ > recheck unexpectedly results in a match condition failure, causing the > update path that should remain MATCHED to be treated as NOT MATCHED. > > I spent a little time looking at this, and managed to reduce the > reproducer test case down to this: > > -- Setup > drop table if exists t1,t2; > create table t1(a int primary key, b int); > create table t2(a int, b int); > > insert into t1 values(1,0),(2,0); > insert into t2 values(1,1),(2,2); > > -- Session 1 > begin; > update t1 set b = b+1; > > -- Session 2 > merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a) > when matched then > update set b = t1.b + 1 > when not matched then > insert (a,b) values (1,1); > > -- Session 1 > commit; > > This works fine in PG17, but fails with a PK violation in PG18. > Git-bisecting points to this commit: > > cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit > commit cbc127917e04a978a788b8bc9d35a70244396d5b > Author: Amit Langote <amitlan@postgresql.org> > Date: Fri Feb 7 17:15:09 2025 +0900 > > Track unpruned relids to avoid processing pruned relations > > Doing a little more debugging, it looks like the problem might be this > change in InitPlan(): > > - /* ignore "parent" rowmarks; they are irrelevant at runtime */ > - if (rc->isParent) > + /* > + * Ignore "parent" rowmarks, because they are irrelevant at > + * runtime. Also ignore the rowmarks belonging to child tables > + * that have been pruned in ExecDoInitialPruning(). > + */ > + if (rc->isParent || > + !bms_is_member(rc->rti, estate->es_unpruned_relids)) > continue; > > which seems to cause it to incorrectly skip a rowmark, which I suspect > is what is causing EvalPlanQual() to return the wrong result. Thanks for the detailed analysis and adding me to the thread, Dean. I would think that a case that involves no partitioning at all would be untouchable by this code, but it looks like the logic I added is incorrectly affecting cases where pruning isn’t even relevant. I’ll need to look more carefully at why such a rowmark would exist in the rowmarks list if its relation isn’t in es_unpruned_relids. Maybe the set population is incorrect at some point, or perhaps it matters that the set is a copy in the EPQ estate. I’m afk (on vacation) at the moment, so won’t be able to dig into this until next week. — Amit >