Thread

  1. Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery

    Xuneng Zhou <xunengzhou@gmail.com> — 2026-05-01T02:57:08Z

    Hi Marco,
    
    On Tue, Apr 28, 2026 at 12:50 AM Marco Nenciarini <
    marco.nenciarini@enterprisedb.com> wrote:
    
    > v7 patches attached.  No code changes from v6, just rebased on
    > current master to remove minor offset, and the backpatch file is
    > renamed with a "nocfbot-" prefix so the commitfest bot picks up
    > only the master patch.
    >
    >
    > On Mon, Apr 27, 2026 at 6:00 PM Marco Nenciarini <
    > marco.nenciarini@enterprisedb.com> wrote:
    >
    >> Registered in PG20-1: https://commitfest.postgresql.org/patch/6716/
    >>
    >> On Sat, Mar 21, 2026 at 11:52 AM Marco Nenciarini <
    >> marco.nenciarini@enterprisedb.com> wrote:
    >>
    >>> Here are the v6 patches.
    >>>
    >>> Xuneng correctly pointed out that RequestXLogStreaming rounds down,
    >>> not up, so it isn't the cause of the gap.  The actual mechanism is
    >>> that archive recovery processes whole segment files: after both nodes
    >>> replay the same archived segment N, the cascade's next read position
    >>> lands at the start of segment N+1, while the upstream's
    >>> GetStandbyFlushRecPtr returns replayPtr inside segment N.
    >>>
    >>> Changes from v5:
    >>>
    >>> - Updated the code comment and commit message to describe the correct
    >>>   root cause (archive recovery segment granularity, not
    >>>   RequestXLogStreaming truncation).
    >>>
    >>> - Reset the catchup state when the upstream is no longer behind.
    >>>   Without this, if the walreceiver successfully streams, the
    >>>   connection breaks, and it loops back to find itself ahead again,
    >>>   the stale deadline from the previous wait would cause an immediate
    >>>   timeout.
    >>>
    >>> Two patches attached: v6-0001 for master (extends the
    >>> walrcv_identify_system API) and v6-backpatch-0001 for stable branches
    >>> (global variable to preserve ABI).
    >>>
    >>
    Polling at intervals stil seems not good to me. But I don't have a better
    idea for now.
    
    -- 
    Best,
    Xuneng