Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

Andrey Borodin <x4mmm@yandex-team.ru>

From: Andrey Borodin <x4mmm@yandex-team.ru>
To: Vlad Lesin <vladlesin@gmail.com>, Michael Paquier <michael@paquier.xyz>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2026-05-06T09:51:00Z
Lists: pgsql-hackers

Attachments


> On 5 May 2026, at 21:31, Vlad Lesin <vladlesin@gmail.com> wrote:
> 
> It might make sense to commit the test.

Let's try to work on it, maybe we can bring it to good shape to consider for
a commit.

cc to Michael:

prockill_race needs to build the same InjectionPointCondition payload that
injection_wait consumes to know which PID to block. The struct is currently
private to injection_points.c, so the patch extracts it into a small header
that prockill_race.c includes via a relative "../injection_points/" path.
That works but feels non-idiomatic. Since injection_points grows organically
to support new bug reproducers anyway, making the condition type part of its
public header seems like a natural fit - but we are not sure the fix is
committable as-is, so we wanted to ask before doing any more cleanup: is
this refactor acceptable at all, and if so, would you prefer a proper
installed header (as contrib/pg_plan_advice does) over the relative include?

(attached step 0001, other steps are not about injection points infrastructure)

To Vlad:

I simplified the test as much as I could.  The hand-rolled polling loop is
replaced with poll_query_until, big inline comments explaining the
controller-session trick and the PGPROC-scan rationale are moved into the
C helper functions where the mechanism lives, and outcome classification is
two plain ok() calls.

The fix is intact.  I did not find a way to make it simpler - early
DisownLatch, decisions under leader_lwlock, deferred pushes under
freeProcsLock, and the leader-skips-self-push case each address a distinct
invariant the bug violated.  What I have not fully reasoned through is
whether the new ordering affects any surrounding invariants I might have
missed, so a second further review of a bug fix would be good.

Also maybe consider alternative possible names to prockill_race that would be
idiomatic to stuff in test modules... It's kind of not relevant to current stage of the
patch, but anyway.

Thank you!


Best regards, Andrey Borodin.