Re: Improve pg_sync_replication_slots() to wait for primary to advance

shveta malik <shveta.malik@gmail.com>

From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>, shveta malik <shveta.malik@gmail.com>
Date: 2025-08-13T04:47:23Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enhance slot synchronization API to respect promotion signal.

  2. Fix inconsistent elevel in pg_sync_replication_slots() retry logic.

  3. Refactor slot synchronization logic in slotsync.c.

  4. Fix intermittent BF failure in 040_standby_failover_slots_sync.

  5. Add retry logic to pg_sync_replication_slots().

  6. Fix LOCK_TIMEOUT handling in slotsync worker.

  7. Add slotsync skip statistics.

On Mon, Aug 11, 2025 at 1:37 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 11:22 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> >
> > There's also a minor merge conflict because func.sgml is not split
> > into multiple files.
> >
>
> Yes, I fixed this.
>

Thanks for the patch. Please find a few comments:

1)
We can merge refresh_remote_slots and fetch_remote_slots by passing an
argument of remote_list. If no remote_list passed, fetch all failover
slots, else extend the query and fetch only the listed ones.

2)
We can get rid of 'sync_iterations' and the logic within, as I think
there is no need to distinguish between slotsync and API in terms of
logs.

3)
sync_start_pending is not needed to be passed to
update_and_persist_local_synced_slot(), as the output of this function
is good enough to tell whether slot is persisted or not.

4)
Also how about having sync-pending in SlotSyncCtxStruct. It can be set
unconditionally by both slotsync and API, but will be used by API. I
think it can simplify the code.

5)
We can get rid of 'pending_sync_start_slots', as it is not being used anywhere.

6)
Also we can mention in comments as to why we are using the old
remote_slots list in refresh_remote_slots() during subsequent cycles
of API rather than using only the pending-slot list.

thanks
Shveta