Re: Improve pg_sync_replication_slots() to wait for primary to advance

Amit Kapila <amit.kapila16@gmail.com>

From: Amit Kapila <amit.kapila16@gmail.com>
To: Dilip Kumar <dilipbalaut@gmail.com>
Cc: shveta malik <shveta.malik@gmail.com>, Ajin Cherian <itsajin@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-07-19T11:39:49Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enhance slot synchronization API to respect promotion signal.

  2. Fix inconsistent elevel in pg_sync_replication_slots() retry logic.

  3. Refactor slot synchronization logic in slotsync.c.

  4. Fix intermittent BF failure in 040_standby_failover_slots_sync.

  5. Add retry logic to pg_sync_replication_slots().

  6. Fix LOCK_TIMEOUT handling in slotsync worker.

  7. Add slotsync skip statistics.

On Fri, Jul 18, 2025 at 11:31 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Jul 18, 2025 at 11:25 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Okay.  I see your point. Yes, it was non-blocking earlier but it was
> > not giving ERROR, it was just dumping in logilfe that primary is
> > behind and thus slot-sync could not be done.
> >
> > If we continue using the non-blocking mode, there’s a risk that the
> > API may never successfully sync the slots. This is because it
> > eventually drops the temporary slot on exit, and when it tries to
> > create a new one later on subsequent call, it’s likely that the new
> > slot will again be ahead of the primary. This may happen if we have
> > continuous ongoing writes on the primary and the logical slot is not
> > being consumed at the same pace.
> >
> > My preference would be to avoid including such an option as it is
> > confusing. With such an option in place, users may think that
> > slot-sync is completed while that may not be the case.
>
> Fair enough
>

I think if we want we may return bool and return false when sync is
not complete say due to promotion or other reason like timeout.
However, at this stage it is not very clear whether it will be useful
to provide additional timeout parameter. But we can consider retruning
true/false depending on whether we are successful in syncing the slots
or not.

-- 
With Regards,
Amit Kapila.