Re: Improve pg_sync_replication_slots() to wait for primary to advance
Chao Li <li.evan.chao@gmail.com>
From: Chao Li <li.evan.chao@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Japin Li <japinli@hotmail.com>,
shveta malik <shveta.malik@gmail.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>,
Ashutosh Sharma <ashu.coek88@gmail.com>,
Amit Kapila <amit.kapila16@gmail.com>,
PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-10-30T11:15:34Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
Hi Ajin,
I have reviewed v20 and got a few comments:
> On Oct 30, 2025, at 18:18, Ajin Cherian <itsajin@gmail.com> wrote:
>
> <v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch>
1 - slotsync.c
```
+ if (slot_names)
+ list_free_deep(slot_names);
/* Cleanup the synced temporary slots */
ReplicationSlotCleanup(true);
@@ -1762,5 +2026,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
/* We are done with sync, so reset sync flag */
reset_syncing_flag();
}
- PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+ PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
```
I am afraid there is a risk of double memory free. Slot_names has been assigned to fparams.slot_names within the for loop, and it’s freed after the loop. If something gets wrong and slotsync_failure_callback() is called, the function will free fparams.slot_names again.
2 - slotsync.c
```
+ /*
+ * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+ * fetch all failover-enabled slots. Note that we reuse slot_names from
+ * the first iteration; re-fetching all failover slots each time could
+ * cause an endless loop. Instead of reprocessing only the pending slots
+ * in each iteration, it's better to process all the slots received in
+ * the first iteration. This ensures that by the time we're done, all
+ * slots reflect the latest values.
+ */
+ remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+ /* Attempt to synchronize slots */
+ some_slot_updated = synchronize_slots(wrconn, remote_slots,
+ &slot_persistence_pending);
+
+ /*
+ * If slot_persistence_pending is true, extract slot names
+ * for future iterations (only needed if we haven't done it yet)
+ */
+ if (slot_names == NIL && slot_persistence_pending)
+ {
+ slot_names = extract_slot_names(remote_slots);
+
+ /* Update the failure structure so that it can be freed on error */
+ fparams.slot_names = slot_names;
+ }
```
I am thinking if that could be a problem. As you now extract_slot_names() only in the first iteration, if a slot is dropped, and a new slot comes with the same name, will the new slot be incorrectly synced?
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/