Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery
Xuneng Zhou <xunengzhou@gmail.com>
From: Xuneng Zhou <xunengzhou@gmail.com>
To: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Cc: Fujii Masao <masao.fujii@gmail.com>, pgsql-hackers@postgresql.org
Date: 2026-05-01T02:57:08Z
Lists: pgsql-hackers
Hi Marco, On Tue, Apr 28, 2026 at 12:50 AM Marco Nenciarini < marco.nenciarini@enterprisedb.com> wrote: > v7 patches attached. No code changes from v6, just rebased on > current master to remove minor offset, and the backpatch file is > renamed with a "nocfbot-" prefix so the commitfest bot picks up > only the master patch. > > > On Mon, Apr 27, 2026 at 6:00 PM Marco Nenciarini < > marco.nenciarini@enterprisedb.com> wrote: > >> Registered in PG20-1: https://commitfest.postgresql.org/patch/6716/ >> >> On Sat, Mar 21, 2026 at 11:52 AM Marco Nenciarini < >> marco.nenciarini@enterprisedb.com> wrote: >> >>> Here are the v6 patches. >>> >>> Xuneng correctly pointed out that RequestXLogStreaming rounds down, >>> not up, so it isn't the cause of the gap. The actual mechanism is >>> that archive recovery processes whole segment files: after both nodes >>> replay the same archived segment N, the cascade's next read position >>> lands at the start of segment N+1, while the upstream's >>> GetStandbyFlushRecPtr returns replayPtr inside segment N. >>> >>> Changes from v5: >>> >>> - Updated the code comment and commit message to describe the correct >>> root cause (archive recovery segment granularity, not >>> RequestXLogStreaming truncation). >>> >>> - Reset the catchup state when the upstream is no longer behind. >>> Without this, if the walreceiver successfully streams, the >>> connection breaks, and it loops back to find itself ahead again, >>> the stale deadline from the previous wait would cause an immediate >>> timeout. >>> >>> Two patches attached: v6-0001 for master (extends the >>> walrcv_identify_system API) and v6-backpatch-0001 for stable branches >>> (global variable to preserve ABI). >>> >> Polling at intervals stil seems not good to me. But I don't have a better idea for now. -- Best, Xuneng