Re: Improve pg_sync_replication_slots() to wait for primary to advance

Ashutosh Sharma <ashu.coek88@gmail.com>

From: Ashutosh Sharma <ashu.coek88@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: shveta malik <shveta.malik@gmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-09-08T04:33:20Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enhance slot synchronization API to respect promotion signal.

  2. Fix inconsistent elevel in pg_sync_replication_slots() retry logic.

  3. Refactor slot synchronization logic in slotsync.c.

  4. Fix intermittent BF failure in 040_standby_failover_slots_sync.

  5. Add retry logic to pg_sync_replication_slots().

  6. Fix LOCK_TIMEOUT handling in slotsync worker.

  7. Add slotsync skip statistics.

Hi,

Sharing some of my review comments, please look if these make sense to you.

On Wed, Sep 3, 2025 at 3:20 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Wed, Sep 3, 2025 at 6:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
> > >
> > > Attaching v10 with the above changes.
> > >
> >
> > The patch does not apply on HEAD. Can you please rebase?
>
> Rebased and made a small change as well to use TopMemoryContext rather
> than create a new context for slot_list.
>

+ /*
+ * If we've been promoted, then no point
+ * continuing.
+ */
+ if (SlotSyncCtx->stopSignaled)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("exiting from slot synchronization as"
+ " promotion is triggered")));
+ break;
+ }
"break" statement here looks redundant to me.

--

+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */

wait for 2 seconds before retrying - the constant is
SLOTSYNC_API_NAPTIME_MS, so technically it may *not* always be 2s if
the macro changes. Maybe reword to “wait for SLOTSYNC_API_NAPTIME_MS
before retrying” would look better?

--

/* Retry until all slots are sync ready atleast */

and


/* Done if all slots are atleast sync ready */

atleast -> "at least". I am just making this comment because at few
places in the same file I see "at least" and not "atleast".

--

+static void ProcessSlotSyncInterrupts(void);

Is this change related to this patch?

--

+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">

double "be".

--
With Regards,
Ashutosh Sharma.