Thread

  1. Fix crash during recovery when redo segment is missing

    Nitin Jadhav <nitinjadhavpostgres@gmail.com> — 2025-02-21T10:59:27Z

    Hi,
    
    In [1], Andres reported a bug where PostgreSQL crashes during recovery
    if the segment containing the redo pointer does not exist. I have
    attempted to address this issue and I am sharing a patch for the same.
    
    The problem was that PostgreSQL did not PANIC when the redo LSN and
    checkpoint LSN were in separate segments, and the file containing the
    redo LSN was missing, leading to a crash. Andres has provided a
    detailed analysis of the behavior across different settings and
    versions. Please refer to [1] for more information. This issue arises
    because PostgreSQL does not PANIC initially.
    
    The issue was resolved by ensuring that the REDO location exists once
    we successfully read the checkpoint record in InitWalRecovery(). This
    prevents control from reaching PerformWalRecovery() unless the WAL
    file containing the redo record exists. A new test script,
    044_redo_segment_missing.pl, has been added to validate this. To
    populate the WAL file with a redo record different from the WAL file
    with the checkpoint record, I wait for the checkpoint start message
    and then issue a pg_switch_wal(), which should occur before the
    completion of the checkpoint. Then, I crash the server, and during the
    restart, it should log an appropriate error indicating that it could
    not find the redo location. Please let me know if there is a better
    way to reproduce this behavior. I have tested and verified this with
    the various scenarios Andres pointed out in [1]. Please note that this
    patch does not address error checking in StartupXLOG(),
    CreateCheckPoint(), etc., nor does it focus on cleaning up existing
    code.
    
    Attaching the patch. Please review and share your feedback. Thanks to
    Andres for spotting the bug and providing the detailed report [1].
    
    [1]: https://www.postgresql.org/message-id/20231023232145.cmqe73stvivsmlhs%40awork3.anarazel.de
    
    Best Regards,
    Nitin Jadhav
    Azure Database for PostgreSQL
    Microsoft