Thread

  1. Some problems about cascading replication

    Fujii Masao <masao.fujii@gmail.com> — 2011-08-16T08:55:49Z

    Hi,
    
    When I tested the PITR on git master with max_wal_senders > 0,
    I found that the following inappropriate log meesage was always
    output even though cascading replication is not in progress. Attached
    patch fixes this problem.
    
        LOG:  terminating all walsender processes to force cascaded
    standby(s) to update timeline and reconnect
    
    When making the patch, I found another problem about cascading
    replication; When promoting a cascading standby, postmaster sends
    SIGUSR2 to any cascading walsenders to kill them. But there is a
    orner-case where such walsender fails to receive SIGUSR2 and
    survives a standby promotion unexpectedly. This happens when
    postmaster sends SIGUSR2 before the walsender marks itself as
    a WAL sender, because postmaster sends SIGUSR2 to only the
    processes marked as a WAL sender.
    
    To avoid the corner-case, I changed walsender so that it checks
    whether recovery is in progress or not again after marking itself
    as a WAL sender. If recovery is not in progress even though the
    walsender is cascading one, it does the same thing as SIGUSR2
    signal handler does, and then exits later. Attached patch also includes
    this fix.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center