Re: Timeline switching with partial WAL records can break replica recovery

Alena Vinter <dlaaren8@gmail.com>

From: Alena Vinter <dlaaren8@gmail.com>
To: Artem Gavrilov <artem.gavrilov@percona.com>
Cc: Nataliia <k.natalissa@gmail.com>, pgsql-hackers@lists.postgresql.org
Date: 2025-12-26T07:09:08Z
Lists: pgsql-hackers

Attachments

Hi Artem!

Thank you for the clarification about archiving. I now fully understand why
writing a missing contrecord into an already-archived timeline is unsafe.

Could this be avoided by having the standby check the WAL archive before
promotion? Specifically, if the standby detects an incomplete contrecord at
the end of its WAL stream, it attempts to fetch the contrecord from the
archive, and only if the contrecord is not found in the archive, it
proceeds with writing a missing contrecord and starting a new timeline.
What do you think?

I plan to reproduce your described scenario to test both my original patch
and this revised approach.

P.S. I'm attaching my notes just so I don’t lose them =)

---
Alena Vinter