Re: Improve pg_sync_replication_slots() to wait for primary to advance

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
To: Ashutosh Sharma <ashu.coek88@gmail.com>
Cc: Ajin Cherian <itsajin@gmail.com>, shveta malik <shveta.malik@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-09-08T04:20:04Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enhance slot synchronization API to respect promotion signal.

  2. Fix inconsistent elevel in pg_sync_replication_slots() retry logic.

  3. Refactor slot synchronization logic in slotsync.c.

  4. Fix intermittent BF failure in 040_standby_failover_slots_sync.

  5. Add retry logic to pg_sync_replication_slots().

  6. Fix LOCK_TIMEOUT handling in slotsync worker.

  7. Add slotsync skip statistics.

On Sat, Sep 6, 2025 at 12:05 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> On Fri, Sep 5, 2025 at 6:52 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
> > >
> > > I've tested this and I see that interrupts are being handled by
> > > sending SIGQUIT and SIGINT to the backend process.
> >
> > Can you please point me to the code (the call to
> > CHECK_FOR_INTERRUPTS()) which processes these interrupts while
> > pg_sync_replication_slots() is executing, especially when the function
> > is waiting while syncing a slot.
> >
>
> I noticed that the function libpqrcv_processTuples, which is invoked
> by fetch_remote_slots, includes a CHECK_FOR_INTERRUPTS call. This is
> currently helping in processing interrupts while we are in an infinite
> loop within SyncReplicationSlots(). I’m just pointing this out based
> on my observation while reviewing the changes in this patch. Ajin,
> please correct me if I’m mistaken. If not, can we always rely on this
> particular check for interrupts.

It doesn't seem good to rely on CHECKF_FOR_INTERRUPTS from so far
away. It's better to have one being called from SyncReplicationSlots()
which has the wait loop. That's how the other functions which have
potentially long wait loops do.

-- 
Best Wishes,
Ashutosh Bapat