Re: Improve pg_sync_replication_slots() to wait for primary to advance
shveta malik <shveta.malik@gmail.com>
From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Ashutosh Sharma <ashu.coek88@gmail.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>,
shveta malik <shveta.malik@gmail.com>
Date: 2025-09-10T04:45:31Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
On Tue, Sep 9, 2025 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote: > > Attached v11 patch addressing the above comments. > Please find a few comments: 1) + Retry is done after 2 + * sec wait. Exits early if promotion is triggered or certain critical We can say: Retry is done after SLOTSYNC_API_NAPTIME_MS wait. 2) + /* + * Fetch remote slot info for the given slot_names. If slot_names is NIL, + * fetch all failover-enabled slots. Note that we reuse slot_names from + * the previous iteration; re-fetching all failover slots each time could + * cause an endless loop. + */ a) the previous iteration --> the first iteration. b) Also we can mention the reason why we take names from first iteration instead of going for pending ones alone, something like: Instead of reprocessing only the pending slots in each iteration, it's better to process all the slots received in the first iteration. This ensures that by the time we're done, all slots reflect the latest values. 3) + remote_slots = fetch_remote_slots(wrconn, slot_names); + + + /* Attempt to synchronize slots */ + synchronize_slots(wrconn, remote_slots, &slot_persistence_pending); One extra blank line can be removed 4) + /* Clean up slot_names if allocated in TopMemoryContext */ + if (slot_names) + list_free_deep(slot_names); Can we please move it before 'ReplicationSlotCleanup'. 5) In case of error in subsequent iteration, slot_names allocated from TopMemoryContext will be left unfreed? 6) + ListCell *lc; + bool first_slot = true; Shall we move these two to concerned if-block: if (slot_names != NIL) 7) * The slot_persistence_pending flag is used by the pg_sync_replication_slots * API to track if any slots could not be persisted and need to be retried. a) Instead of mentioning only about slot_persistence_pending argument in concerned function's header, we shall define all the arguments. b) We can remove the 'flag' term from the comments as it is a function-argument now. 8) I think we should add briefly in the header of the file about the new behaviour of API. thanks Shveta