Thread

  1. [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL

    Srinath Reddy Sadipiralla <srinath2133@gmail.com> — 2025-09-06T16:33:45Z

    Hi all,
    
    While working with pg_rewind, I noticed that it can sometimes request a
    rewind even when no actual changes exist after a failover.
    
    *Problem:*
    Currently, pg_rewind determines the end-of-WAL on the target by using the
    last shutdown checkpoint (or minRecoveryPoint for a standby). This creates
    a false positive scenario:
    
    1)Suppose a standby is promoted to become the new primary.
    2)Later, the old primary is cleanly shut down.
    3)The only WAL record generated on the old primary after divergence is a
    shutdown checkpoint.
    
    At this point, the old primary and new primary contain identical data.
    However, since the shutdown checkpoint extends the WAL past the divergence
    point, pg_rewind concludes:
    
    if (target_wal_endrec > divergerec)
        rewind_needed = true;
    
    That forces a rewind even though there are no meaningful changes.
    
    To *reproduce this scenario* use the below attached script.
    
    *Fix:*
    The attached patch changes the logic so that pg_rewind no longer treats
    shutdown checkpoints as meaningful records when determining the end-of-WAL.
    Instead, we scan backward from the last checkpoint until we find the most
    recent valid WAL record that is not a shutdown-only related record.
    
    This ensures rewind is only triggered when there are actual modifications
    after divergence, avoiding unnecessary rewinds in clean failover scenarios.
    
    
    -- 
    Thanks,
    Srinath Reddy Sadipiralla
    EDB: https://www.enterprisedb.com/