Thread
-
Re: Deadlock detector fails to activate on a hot standby replica
Fujii Masao <masao.fujii@gmail.com> — 2026-05-26T15:01:38Z
On Mon, May 25, 2026 at 11:20 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote: > > Dear Fujii Masao, > > Thank you for the review of the patch. Based on your comments I propose > a new version of the patch. Thanks for updating the patch! + if (got_standby_delay_timeout) + SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN); + else if (got_standby_deadlock_timeout) + { Shouldn't we break out of the loop when either got_standby_delay_timeout or got_standby_deadlock_timeout becomes true? Otherwise, the loop continues with those flags still set, which could cause SendRecoveryConflictWithBufferPin() to be called unnecessarily in the subsequent cycles. + if (BufferGetRefCount(buffer) <= 1) Should this be "BufferGetRefCount(buffer) == 1" instead? I don't think BufferGetRefCount(buffer) should ever return 0 here. If that's correct, would it make sense to explicitly detect that case, for example: ----------------- uint32 refcount = BufferGetRefCount(buffer); Assert(refcount > 0); if (refcount == 0) elog(ERROR, "buffer refcount dropped to zero while waiting for cleanup lock"); if (refcount == 1) break; ----------------- > I also added a new tap-test as a part of the patch. I did some changes in the > tap test to make it stable. Let me know, please, if it should be in a separate > commit. Do we really need a new TAP test file for this? We already have a startup/backend deadlock test in t/031_recovery_conflict.pl. Extending the deadlock section there to cover this test case seems simpler and easier to follow than introducing a new t/053_* test. Regards, -- Fujii Masao