Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

Vlad Lesin <vladlesin@gmail.com>

From: Vlad Lesin <vladlesin@gmail.com>
To: Michael Paquier <michael@paquier.xyz>, Andrey Borodin <x4mmm@yandex-team.ru>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2026-05-15T12:29:30Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix procLatch ownership race in ProcKill()

Attachments

Hello Michael, thank you for your review.

I am sending a new series of patches that incorporate your code review 
notes. The version prefix is "vVL2-" to distinguish Andrey's patches 
from mine.

On 5/11/26 04:58, Michael Paquier wrote:
> On Thu, May 07, 2026 at 04:57:28PM +0500, Andrey Borodin wrote:
>> Done so. I still keep Vlad's injection_point_condition.h though.
>> It seems useful. But no more Meson\Makefile changes, and new sql stuff lives
>> now near removable_cutoff() SQL.
> 
> Because you want to have your callbacks in a new regress_prockill.c
> while being able to reuse the structures.  I don't see why not on
> clarity ground.  The creation of injection_point_condition.h could be
> done in its own commit, independent of the rest, but that's just me
> being pedantic about such matters.  I think that this should be
> backpatched anyway, that could make the introduction of new tests
> easier.

I moved the creation of injection_point_condition.h into a separate 
commit in 0001 patch. Additionally, I noticed that if several processes 
are waiting on the same injection point, only one of them will be 
awakened by a single injection_points_wakeup() call. I am not sure if 
this behavior is intentional; please let me know.

For context, MySQL has a similar feature called debug sync points [0], 
which allows waking up several threads waiting on them. I suppose this 
same semantics would be useful for injection points as well to simplify 
tests. I implemented this behavior in the 0002 patch and utilized it in 
0003. If you disagree with this approach, I can rewrite the test to use 
two separate injection points for the lock group leader and follower.

> +# Two backends form a lock group, then are terminated concurrently. The test
> +# uses injection points placed inside ProcKill() to pause both victims there
> +# and verify that a freshly forked backend can claim the recycled PGPROC slot
> +# without hitting "latch already owned by PID ..." PANIC.
> 
> This claim is slightly incorrect here.  If I undo the fix, place the
> two INJECTION_POINT macros and re-run the test, the test does not
> produce a PANIC.  In order to reproduce a PANIC, it seems that we
> would need something more complicated with an extra point after a
> DisownLatch().  That is not required to me, still this is misleading
> and could be improved, I guess..

Yes, you are right, I fixed the race in 0003 patch with additional 
injection point.

>  I think that we should refactor the patch into two pieces,
> for clarity:
> - Patch 1 refactors the freelist manipulations and introduces the two
> new boolean flags, documenting more precily how the freelists are
> manipulated and why things are done this way (leader, self, etc.).
> - Patch 2, which is about the earlier DisownLatch(), then becomes a
> fix of its own.  That's the primary fix we care about, to avoid the
> entries to be recycled due to a too-early PGPROC entry marked as
> available.

Done. I have split this into two separate patches, 0004 and 0005. The 
test in 0003 fails on patch 0004 and passes on 0005.

> Also, I suspect that the test is racy?  For one, I strongly suspect
> that you are going to need tweaks similar 011_lock_stats.pl in
> test_misc where we rely on some query_until() with psql banners, to
> make sure that the waits actually happen..  I got the point about
> pg_stat_activity not being something we can rely on here.  That's
> something that would be easier to see on an overloaded machine.

The test should no longer be racy following the changes in 0003.

-- 
Best regards,
Vlad

[0] https://dev.mysql.com/doc/dev/mysql-server/9.6.0/PAGE_DEBUG_SYNC.html