Thread
-
Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update
Tender Wang <tndrwang@gmail.com> — 2025-12-24T12:07:00Z
Amit Langote <amitlangote09@gmail.com> 于2025年12月24日周三 16:08写道: > Hi, > > On Tue, Dec 23, 2025 at 4:07 Dean Rasheed <dean.a.rasheed@gmail.com> > wrote: > >> On Mon, 22 Dec 2025 at 14:51, Bh W <wangbihua.cn@gmail.com> wrote: >> > >> > The issue is that the MERGE INTO match condition is not updated. >> > In the MATCHED path of MERGE INTO, when the target row satisfies the >> match condition and the condition itself has not changed, the system should >> still be able to handle concurrent updates to the same target row by >> relying on EvalPlanQual (EPQ) to refetch the latest version of the tuple, >> and then proceed with the intended update. >> > However, in the current implementation, even though the concurrent >> update does not modify any columns relevant to the ON condition, the EPQ >> recheck unexpectedly results in a match condition failure, causing the >> update path that should remain MATCHED to be treated as NOT MATCHED. >> >> I spent a little time looking at this, and managed to reduce the >> reproducer test case down to this: >> >> -- Setup >> drop table if exists t1,t2; >> create table t1(a int primary key, b int); >> create table t2(a int, b int); >> >> insert into t1 values(1,0),(2,0); >> insert into t2 values(1,1),(2,2); >> >> -- Session 1 >> begin; >> update t1 set b = b+1; >> >> -- Session 2 >> merge into t1 using (values(1,1),(2,2)) as t3(a,b) on (t1.a = t3.a) >> when matched then >> update set b = t1.b + 1 >> when not matched then >> insert (a,b) values (1,1); >> >> -- Session 1 >> commit; >> >> This works fine in PG17, but fails with a PK violation in PG18. >> Git-bisecting points to this commit: >> >> cbc127917e04a978a788b8bc9d35a70244396d5b is the first bad commit >> commit cbc127917e04a978a788b8bc9d35a70244396d5b >> Author: Amit Langote <amitlan@postgresql.org> >> Date: Fri Feb 7 17:15:09 2025 +0900 >> >> Track unpruned relids to avoid processing pruned relations >> >> Doing a little more debugging, it looks like the problem might be this >> change in InitPlan(): >> >> - /* ignore "parent" rowmarks; they are irrelevant at runtime */ >> - if (rc->isParent) >> + /* >> + * Ignore "parent" rowmarks, because they are irrelevant at >> + * runtime. Also ignore the rowmarks belonging to child >> tables >> + * that have been pruned in ExecDoInitialPruning(). >> + */ >> + if (rc->isParent || >> + !bms_is_member(rc->rti, estate->es_unpruned_relids)) >> continue; >> >> which seems to cause it to incorrectly skip a rowmark, which I suspect >> is what is causing EvalPlanQual() to return the wrong result. > > > Thanks for the detailed analysis and adding me to the thread, Dean. > > I would think that a case that involves no partitioning at all would be > untouchable by this code, but it looks like the logic I added is > incorrectly affecting cases where pruning isn’t even relevant. I’ll need to > look more carefully at why such a rowmark would exist in the rowmarks list > if its relation isn’t in es_unpruned_relids. Maybe the set population is > incorrect at some point, or perhaps it matters that the set is a copy in > the EPQ estate. > I did some debugging, and I found that: In add_rte_to_flat_rtable(), the RTE of value was not added into glob->AllRelids, because below codes: ..... if (newrte->rtekind == RTE_RELATION || (newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid))) { glob->relationOids = lappend_oid(glob->relationOids, newrte->relid); glob->allRelids = bms_add_member(glob->allRelids, list_length(glob->finalrtable)); } .... The VALUE rte was not satisfied above if, so it was not added into the glob->allRelids. Then in standard_planner(), we have: .... result->unprunableRelids = bms_difference(glob->allRelids, glob->prunableRelids); .... So the result->unprunableRelids contains only 1. In InitPlan(), we have: ..... /* * Ignore "parent" rowmarks, because they are irrelevant at * runtime. Also ignore the rowmarks belonging to child tables * that have been pruned in ExecDoInitialPruning(). */ if (rc->isParent || !bms_is_member(rc->rti, estate->es_unpruned_relids)) continue; ..... the estate->es_unpruned_relids equals with result->unprunableRelids contains. So the rowMark was skipped incorrectly. I did a quick fix as the attached patch. Any thoughts? -- Thanks, Tender Wang