Re: Improve pg_sync_replication_slots() to wait for primary to advance

shveta malik <shveta.malik@gmail.com>

From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Japin Li <japinli@hotmail.com>, Ashutosh Sharma <ashu.coek88@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>, shveta malik <shveta.malik@gmail.com>
Date: 2025-11-26T08:42:29Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enhance slot synchronization API to respect promotion signal.

  2. Fix inconsistent elevel in pg_sync_replication_slots() retry logic.

  3. Refactor slot synchronization logic in slotsync.c.

  4. Fix intermittent BF failure in 040_standby_failover_slots_sync.

  5. Add retry logic to pg_sync_replication_slots().

  6. Fix LOCK_TIMEOUT handling in slotsync worker.

  7. Add slotsync skip statistics.

 A few comments:

1)
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+ WalReceiverConn *wrconn;
+ List *slot_names;
+} SlotSyncApiFailureParams;
+

We can get rid of it now as we do not use it.

2)
ProcessSlotSyncInterrupts():

+ if (!AmLogicalSlotSyncWorkerProcess())
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot continue replication slots synchronization"
+    " as standby promotion is triggered"));
+ else
+ {

Can we please reverse the if-else i.e. first worker and then API.
Negated if-condition can be avoided in this case.

3)

slotsync_reread_config():
+ /* Worker-specific check for sync_replication_slots change */

Now since we check for both API and worker, this comment is not needed.

4)
- ereport(LOG,
- errmsg("replication slot synchronization worker will restart because
of a parameter change"));

- /*
- * Reset the last-start time for this worker so that the postmaster
- * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
- */
- SlotSyncCtx->last_start_time = 0;
+ ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
+ errmsg("replication slot synchronization will stop because of a
parameter change"));

Here, we should retain the same old message for worker i.e. 'worker
will restart..'. instead of 'synchronization will stop'. I find the
old message better in this case.

5)
slotsync_reread_config() is slightly difficult to follow.
I think in the case of API, we can display a common error message
instead of 2 different messages for 'sync_replication_slot change' and
the rest of the parameters. We can mark if any of the parameters
changed in both 'if' blocks and if the current process has not exited,
then at the end based on 'parameter-changed', we can deal with API by
giving a common message. Something like:

/*
 * If we have reached here with a parameter change, we must be running
 * in SQL function, emit error in such a case.
 */
if (parameter_changed (new variable))
{
Assert (!AmLogicalSlotSyncWorkerProcess);
ereport(ERROR,
errmsg("replication slot synchronization will stop because of a
parameter change"));
}

thanks
Shveta