Thread

Re: injection_points: Switch wait/wakeup to use atomics rather than latches

Michael Paquier <michael@paquier.xyz> — 2026-05-28T23:19:42Z
On Thu, May 28, 2026 at 08:40:39AM -0400, Robert Haas wrote:
> After reading this email, the linked-to email, and the commit message
> for the patch, I still don't have a clear understanding of what this
> is intended to fix. It seems like it's going to make the
> responsiveness worse. In general, we want to replace escalating wait
> loops with things that wake up instantly at the right time, and this
> is going in the opposite direction.

This is an exchange between responsiveness of the system and
flexibility.  I have had two complaints in the past about the fact
that the waits and wakeups were not doable due to the fact that we
rely on condition variables and latches:
- Postmaster context (lack of dsm access as one).  Heikki has
mentioned that to me once as annoying when hacking on tests there at
protocol level, at least.
- Second case as shown on the previous thread, which was a tricky
scenario involving the termination of backends.

One limitation is also related to wait event visibility, which may not
be visible in pg_stat_activity.  We could simply add a LOG entry in
injection_wait() once the old count is read, and rely on a server log
lookup in the TAP tests where we cannot use pg_stat_activity.

Compared to redesigning all the facilities that injection_points
relies on, this patch was striking me as having a good balance in
terms of responsiveness (min 10us, max 100ms) vs portability.  The
minimum threshold does not really matter much in terms of runtime on
fast machines.

Does this explanation make sense?
--
Michael