RE: Improve pg_sync_replication_slots() to wait for primary to advance

Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>

From: "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>
To: Amit Kapila <amit.kapila16@gmail.com>, shveta malik <shveta.malik@gmail.com>
Cc: Ajin Cherian <itsajin@gmail.com>, Yilin Zhang <jiezhilove@126.com>, Chao Li <li.evan.chao@gmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Japin Li <japinli@hotmail.com>, Ashutosh Sharma <ashu.coek88@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-12-17T10:28:28Z
Lists: pgsql-hackers

Attachments

On Monday, December 15, 2025 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Dec 12, 2025 at 8:53 AM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > On Fri, Dec 12, 2025 at 5:35 AM Ajin Cherian <itsajin@gmail.com> wrote:
> > >
> > >
> > > I have included these changes as well as comments by Chao. Attaching
> > > v37 with the changes.
> > >
> >
> > Thanks. v37 LGTM.
> >
> 
> Pushed.

My college reported a related BF failure[1] to me off-list.

After analyzing, I think the issue is that the newly added test in
040_standby_failover_slots_sync synchronizes a replication slot to the standby
server without configuring synchronized_standby_slots. This omission allows
logical failover slots to advance beyond the designated physical replication
slot, resulting in intermittent synchronization failures.

I confirmed the same from the log where the slotsync failed due to the
reason mentioned above:

--
2025-12-15 12:30:33.502 CET [3015371][client backend][1/2:0] ERROR:  skipping slot synchronization because the received slot sync LSN 0/06017C90 for slot "lsub1_slot" is ahead of the standby position 0/06017C58
2025-12-15 12:30:33.502 CET [3015371][client backend][1/2:0] STATEMENT:  SELECT pg_sync_replication_slots();
--

Here is a small patch to fix it.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2025-12-15%2011%3A25%3A38

Best Regards,
Hou zj