Thread
-
Re: Optimize LISTEN/NOTIFY
Joel Jacobson <joel@compiler.org> — 2025-11-15T21:53:52Z
On Fri, Nov 14, 2025, at 17:01, Joel Jacobson wrote: > On Thu, Nov 13, 2025, at 08:13, Joel Jacobson wrote: >> Attached, please find a new version rebased on top of the bug fix >> patches that just got committed in 0bdc777, 797e9ea, 8eeb4a0, and >> 1b46990. > > To help reviewers, here is a new write-up of the patch: > [...write-up...] While reviewing all the comments in async.c to make sure they match the patch's code, I noticed a few discrepancies. One of them was the comment above QUEUE_CLEANUP_DELAY, which again made me think about how master currently uses that value as the threshold for when to "wake laggers". In this patch, QUEUE_CLEANUP_DELAY is no longer used for that purpose; it now only determines how often we try to advance the tail pointer. I realize someone who is familiar with the current code in master, might ask the following question: Why not do direct advancement, but just use the old "wake laggers" logic for listeners that lag behind more than QUEUE_CLEANUP_DELAY? On the surface it might look like a plausible alternative, since that's what master currently does (but for other databases). I was curious to see how such alternative approach would affect performance, so I changed SignalBackends and ran some benchmarks. Below is the v27 logic: ``` if (QUEUE_BACKEND_IS_ADVANCING(i) ? QUEUE_POS_PRECEDES(QUEUE_BACKEND_ADVANCING_POS(i), queueHeadAfterWrite) : QUEUE_POS_PRECEDES(pos, queueHeadBeforeWrite)) { Assert(pid != InvalidPid); QUEUE_BACKEND_WAKEUP_PENDING(i) = true; pids[count] = pid; procnos[count] = i; count++; } else if (!QUEUE_BACKEND_IS_ADVANCING(i) && QUEUE_POS_PRECEDES(pos, queueHeadAfterWrite)) { Assert(!QUEUE_POS_PRECEDES(pos, queueHeadBeforeWrite)); QUEUE_BACKEND_POS(i) = queueHeadAfterWrite; } ``` And here is the modified "wake laggers" version I tested: ``` if (!QUEUE_BACKEND_IS_ADVANCING(i) && !QUEUE_POS_PRECEDES(pos, queueHeadBeforeWrite) && QUEUE_POS_PRECEDES(pos, queueHeadAfterWrite)) { QUEUE_BACKEND_POS(i) = queueHeadAfterWrite; } else if (asyncQueuePageDiff(QUEUE_POS_PAGE(QUEUE_HEAD), QUEUE_POS_PAGE(pos)) >= QUEUE_CLEANUP_DELAY) { QUEUE_BACKEND_WAKEUP_PENDING(i) = true; pids[count] = pid; procnos[count] = i; count++; } ``` This preserves direct advancement under the same conditions as the patch, but only sends signals to backends that are "laggers" by the QUEUE_CLEANUP_DELAY threshold, similar to master's behavior. It turns out the "wake laggers" approach is significantly slower: "wake laggers": ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 1000 --sleep 0.01 --sleep-exp 1.01 20 s: 80871 sent (4650/s), 80871 received (4650/s) Notification Latency Distribution: 0.00-0.01ms 0 (0.0%) avg: 0.000ms 0.01-0.10ms # 380 (0.5%) avg: 0.085ms 0.10-1.00ms # 2765 (3.4%) avg: 0.289ms 1.00-10.00ms # 4947 (6.1%) avg: 5.870ms 10.00-100.00ms ####### 57467 (71.1%) avg: 52.101ms >100.00ms # 15312 (18.9%) avg: 119.847ms v27: % ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 1000 --sleep 0.01 --sleep-exp 1.01 20 s: 229866 sent (13985/s), 229865 received (13984/s) Notification Latency Distribution: 0.00-0.01ms 0 (0.0%) avg: 0.000ms 0.01-0.10ms # 18351 (8.0%) avg: 0.072ms 0.10-1.00ms # 32899 (14.3%) avg: 0.315ms 1.00-10.00ms #### 103273 (44.9%) avg: 5.055ms 10.00-100.00ms ### 75342 (32.8%) avg: 18.197ms >100.00ms 0 (0.0%) avg: 0.000ms I reran the experiments a three times with similar results. Also tested other test permutaions, that also showed "wake laggers" was worse. At first glance, both approaches ultimately signal all backends interested in our notifies, so it may seem surprising that latency differs this much. The key point is what happens to non-interested backends: A backend that is *not* listening to our channels may have stopped at a position before the old queue head, because it woke up and read the head before all notifies for a previous commit were written. Such backend might actually be interested in the notifies that lie in between its current position and the old queue head, and it could therefore be urgent to wake it up to make it process the queue and delivery the notifies. In v27, such a backend gets signaled on the next NOTIFY, when it notice it is stationary at a pos behind the queueHeadBeforeWrite. With "wake laggers", however, it receives no signal until it becomes a "lagger" by QUEUE_CLEANUP_DELAY pages, which can be far later. This risks delaying its processing and the delivery of notifications it is interested in. In master this is fine because "wake laggers" is only applied to backends in other databases that we know are not interested in our notifications. In conclusion, the "wake laggers" mechanism seems inherently incompatible with the direct advancement mechanism, and I therefore think the current approach in v27 is sound. I just wanted to share this reasoning, not because anyone has raised any concerns about this, but because I hadn't fully internalized this myself, and thought the reasoning could possibly be helpful to others. The attached v28 is the same as v27, except some comments have been fixed to accurately reflect the code. /Joel