Thread

Re: Optimize LISTEN/NOTIFY

Joel Jacobson <joel@compiler.org> — 2025-11-23T20:43:34Z
On Sun, Nov 23, 2025, at 16:49, Joel Jacobson wrote:
> On Sat, Nov 22, 2025, at 22:30, Joel Jacobson wrote:
>> On Thu, Nov 20, 2025, at 21:26, Tom Lane wrote:
>>> I took a brief look through the v28 patch, and I'm fairly distressed
>>> at how much new logic has been stuffed into what's effectively a
>>> critical section.  It's totally not okay for AtCommit_Notify or
>>> anything it calls to throw an error
>>
>> Right, I agree. Thanks for guidance.
>>
>>> So I think there needs to be a serious effort made to move as
>>> much as we possibly can of the potentially-risky stuff into
>>> PreCommit_Notify.  In particular I think we ought to create
>>> the shared channel hash entry then, and even insert our PID
>>> into it.  We could expand the listenersArray entries to include
>>> both a PID and a boolean "is it REALLY listening?", and then
>>> during Exec_ListenCommit we'd only be required to find an
>>> entry we already added and set its boolean, so there's no OOM
>>> hazard.  Possibly do something similar with the local
>>> listenChannelsHash, so as to remove that possibility of OOM
>>> failure as well.
>>
>> Thanks for the idea, I like this approach. I've expanded the
>> listenersArray like suggested, and moved all risky stuff from
>> Exec_ListenCommit to PreCommit_Notify.
>>
>>> (An alternative design could be to go ahead and insert our
>>> PID during PreCommit_Notify, and just tolerate the small
>>> risk of getting signaled when we didn't need to be.  But
>>> then we'd need some mechanism for cleaning out the bogus
>>> entry during AtAbort_Notify.)
>>
>> We seem to need a cleanup mechanism also with the boolean field design,
>> since a channel could end up being added only to listenChannelsHash, but
>> not channelHash, which would trick IsListeningOn() into falsely thinking
>> we're listening on such channel when we're not. This could happen if
>> successfully adding the channel to listenChannelsHash, but OOM when
>> trying to add it to channelHash.
>>
>> AtAbort_Notify now handles such half-state, by reconciling all channels
>> that had LISTEN_LISTEN actions, using the channelHash as the single
>> source of truth, removing channels from both listenChannelsHash and
>> channelHash, unless the active field is true (which means we were
>> already listening to the channel due to a previous transaction).
>>
>> I've tested triggering the cleanup logic by adding elog ERROR that
>> triggered after listenChannelsHash insert, and another test that
>> triggered after channelHash insert, and it cleaned it up correctly. I
>> haven't created a test for it in tree though, would we want that?
>>
>>> I'm not sure what I think about the new logic in SignalBackends
>>> from this standpoint.  Making it very-low-probability-of-error
>>> definitely needs some consideration though.
>>
>> The initChannelHash call in SignalBackends is now gone, moved to
>> PreCommit_Notify (called if there are any pendingNotifies).
>>
>> I also took the liberty of fixing the XXX comment, to lazily preallocate
>> the signals arrays during PreCommit_Notify. It felt too inconsistent to
>> just leave that unfixed, but maybe should be a separate commit?
>
> I've extracted the preallocation of signals arrays into a separate patch:
> https://commitfest.postgresql.org/patch/6248/

New version of the optimization patch, without the preallocation of
signals arrays part (since submitted as a separate patch instead).

>> I wonder how risky the remaining new logic in SignalBackends is. For
>> instance, I looked at dshash_find(..., false), and note it calls
>> LWLockAcquire which in turn could elog ERROR if num locks is exceeded,
>> but in master we're already calling LWLockAcquire in SignalBackends, so
>> should be fine I guess?
>>
>> Apart from that, I don't see any new logic in SignalBackends, that could
>> potentially be risky.

/Joel