Thread

  1. Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits

    Ilmar Y <tanswis42@gmail.com> — 2026-05-30T11:51:30Z

    The following review has been posted through the commitfest application:
    make installcheck-world:  not tested
    Implements feature:       tested, failed
    Spec compliant:           not tested
    Documentation:            not tested
    
    Hi,
    
    I looked at v2, focusing on the latest question about
    log_recovery_conflict_waits behavior.
    
    The patch applies cleanly on current master at
    db5ed03217b9c238703df8b4b286115d6e940488.  git diff --check reports no
    issues.  I built both master and v2 locally with:
    
    ./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu
    make -s -j8
    make -s install
    
    Both builds passed.
    
    I tried to check the case Zsolt described.  In my local repro, the final
    cancellation still happens at about max_standby_streaming_delay in both
    master and v2.  The visible difference is that v2 delays the first
    log_recovery_conflict_waits "recovery still waiting" message.
    
    I used a small primary/standby setup based on the buffer-pin conflict pattern
    from src/test/recovery/t/031_recovery_conflict.pl:
    
    log_recovery_conflict_waits = on
    deadlock_timeout = 100ms
    max_standby_streaming_delay = 5s
    
    The scenario is:
    
    1. Create a small table.
    2. Generate an aborted tuple on the primary and replay it on the standby.
    3. Open a cursor on the standby and fetch one row, leaving the transaction
       idle with the cursor open.
    4. Run VACUUM FREEZE on the primary.
    
    On current master, the standby log shows:
    
    recovery still waiting after 101.060 ms: recovery conflict on buffer pin
    terminating connection due to conflict with recovery
    recovery finished waiting after 5001.324 ms: recovery conflict on buffer pin
    
    With v2, the standby log shows:
    
    recovery still waiting after 5001.064 ms: recovery conflict on buffer pin
    terminating connection due to conflict with recovery
    recovery finished waiting after 5001.535 ms: recovery conflict on buffer pin
    
    So in this repro the early "still waiting" log at deadlock_timeout is lost,
    and the first waiting log is emitted only when the wait is already being
    resolved at about max_standby_streaming_delay.
    
    This seems to come from the new second ProcWaitForSignal() inside
    ResolveRecoveryConflictWithBufferPin().  After the deadlock timeout fires, v2
    sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and waits again internally.  For a
    non-deadlock buffer pin holder, LockBufferForCleanup() does not get control
    back after deadlock_timeout, so it cannot emit the early recovery conflict
    log message.
    
    This looks similar to the reason ResolveRecoveryConflictWithLock() has the
    logging_conflict argument: if the conflict still needs to be logged, it returns
    after sending the deadlock-check signal, and only avoids the repeated wait on
    the later call after the conflict has been logged.
    
    Should ResolveRecoveryConflictWithBufferPin() preserve the same behavior, for
    example by skipping the new second wait until LockBufferForCleanup() has had a
    chance to emit the first log_recovery_conflict_waits message?
    
    I have not checked the repeated SIGUSR1 count with strace in my local macOS
    environment, and I have not reviewed the backpatching question.
    
    Regards,
    Ilmar Yunusov
    
    The new status of this patch is: Waiting on Author