Thread

  1. Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

    Andrey Borodin <x4mmm@yandex-team.ru> — 2026-05-05T09:07:17Z

    
    > On 27 Apr 2026, at 13:14, Vlad Lesin <vladlesin@gmail.com> wrote:
    > 
    > Problem
    > ------------------------------------------------------------------------
    > 
    > If a leader detaches from the lock group under leader_lwlock but
    > has not yet reached DisownLatch(&MyProc->procLatch), a concurrent
    > last follower can still put the *leader* PGPROC on a free list, or
    > the leader and the follower can make inconsistent decisions about
    > *who* returns which PGPROC, so that a slot is linked into the free
    > list with procLatch still owned, or is pushed twice.  A new backend
    > that recycles the slot can then hit:
    > 
    >     PANIC: latch already owned by PID ...
    > 
    > A concrete interleaving (lock group leader vs last member)
    > is the following(PG15 code).
    
    Yeah, the problem seems real to me. Moreover we had related buildfarm
    failures [0] and Deep from GP reported observing the problem there too.
    Yugabyte folks also observed this [1].
    
    The invariant that latch should not be on freelist until it is disowned seems
    reasonable to me.
    
    But the test and the fix both are very confusing here. They are not patch steps
    as someone might expect given 0001,0002,0003 prefixes. They are not based on
    PG 18 as filenames states.
    
    To help resolve this confusion I'm posting following sequence:
    
    1. vAB1-0001-Add-regression-test-for-ProcKill-lock-group-pro.patch
    This is an original test that is expected to demonstrate problem.
    It contains heavy injection points refactoring, I assume it's not intended for commit.
    This test was taken from a file 0003-PG18-unfixed-repro-tap-injection-harness.patch
    
    2. vAB1-0002-Canonicalize-test-with-infrastructure.patch
    My changes needed to make test runnable.
    
    3. vAB1-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch
    Fix for the problem, proposed by the thread starter, rebased on current HEAD
    and test patch.
    The test passes after this step.
    
    I would like to recommend author to make the patch leaner and easier for review.
    
    
    Best regards, Andrey Borodin.
    
    [0] https://www.postgresql.org/message-id/flat/CA%2BhUKGJ_0RGcr7oUNzcHdn7zHqHSB_wLSd3JyS2YC_DYB%2B-V%3Dg%40mail.gmail.com
    [1] https://github.com/yugabyte/yugabyte-db/issues/20309