Thread

  1. Re: Startup PANIC on standby promotion due to zero-filled WAL segment

    Alena Vinter <dlaaren8@gmail.com> — 2025-12-23T09:33:30Z

    Hi Michael,
    
    Thanks for the review. To clarify: TLI 1 does not diverge — it is fully
    replicated to the standby before the timeline switch. The test then
    intentionally slows down replication on TLI 2 (e.g., by delaying WAL
    shipping), reproducing the scenario I illustrated. As far as I’m aware,
    `fsync` is `on` by default, and the test does not modify it — so no WAL
    records are lost due to unsafe flushing.
    
    The core issue is that the new timeline’s segment is zero-initialized
    instead of copying the same segment from the previous timeline (as done in
    crash-recovery startup).  As a result, startup cannot finish recovery due
    to non-replicated end of WAL causing failures like “invalid magic number”.
    
    ---
    Alena Vinter