Re: Improve pg_sync_replication_slots() to wait for primary to advance

shveta malik <shveta.malik@gmail.com>

From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Japin Li <japinli@hotmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Ashutosh Sharma <ashu.coek88@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>, shveta malik <shveta.malik@gmail.com>
Date: 2025-11-04T06:23:34Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enhance slot synchronization API to respect promotion signal.

  2. Fix inconsistent elevel in pg_sync_replication_slots() retry logic.

  3. Refactor slot synchronization logic in slotsync.c.

  4. Fix intermittent BF failure in 040_standby_failover_slots_sync.

  5. Add retry logic to pg_sync_replication_slots().

  6. Fix LOCK_TIMEOUT handling in slotsync worker.

  7. Add slotsync skip statistics.

>
> I have addressed the above comments in patch v21.
>

Thank You. Please find a few comments:

1)
+ fparams.slot_names = slot_names = NIL;

I think it is not needed to set slot_names to NIL.

2)
-    WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+    WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);

The new name does not seem appropriate. For the slotsync-worker case,
even when the primary is not behind, the worker still waits but it is
not waiting for primary to catch-up. I could not find a better name
except the original one 'WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN'. We can
change the explanation to :

"Waiting in main loop of slot sync worker and slot sync API."
Or
"Waiting in main loop of slot synchronization."

If anyone has any better name suggestions, we can consider changing.

3)

+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up with the locally reserved
+# position. The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.

Shall we remove 'with the locally reserved position', it’s already
explained in the test header and the comment is good enough even
without it.

4)
+# Confirm log that the slot has been synced after becoming sync-ready.

Shall we just say:
Confirm from the log that the slot is sync-ready now.

5)
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -945,6 +1007,7 @@ $subscriber1->safe_psql('postgres',
 is( $standby1->safe_psql(
  'postgres',
  q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
+
  ),

Redundant change.

thanks
Shveta