Re: Improve pg_sync_replication_slots() to wait for primary to advance
Ajin Cherian <itsajin@gmail.com>
From: Ajin Cherian <itsajin@gmail.com>
To: shveta malik <shveta.malik@gmail.com>
Cc: Ashutosh Sharma <ashu.coek88@gmail.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-09-15T12:47:43Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
Attachments
- v12-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch (application/octet-stream) patch v12-0001
On Wed, Sep 10, 2025 at 2:45 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Sep 9, 2025 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > Attached v11 patch addressing the above comments. > > > > Please find a few comments: > > 1) > > + Retry is done after 2 > + * sec wait. Exits early if promotion is triggered or certain critical > > We can say: Retry is done after SLOTSYNC_API_NAPTIME_MS wait. > Changed. > 2) > + /* > + * Fetch remote slot info for the given slot_names. If slot_names is NIL, > + * fetch all failover-enabled slots. Note that we reuse slot_names from > + * the previous iteration; re-fetching all failover slots each time could > + * cause an endless loop. > + */ > > a) > the previous iteration --> the first iteration. > > b) Also we can mention the reason why we take names from first > iteration instead of going for pending ones alone, something like: > > Instead of reprocessing only the pending slots in each iteration, it's > better to process all the slots received in the first iteration. > This ensures that by the time we're done, all slots reflect the latest values. > > 3) > + remote_slots = fetch_remote_slots(wrconn, slot_names); > + > + > + /* Attempt to synchronize slots */ > + synchronize_slots(wrconn, remote_slots, &slot_persistence_pending); > > One extra blank line can be removed > Fixed. > 4) > > + /* Clean up slot_names if allocated in TopMemoryContext */ > + if (slot_names) > + list_free_deep(slot_names); > > Can we please move it before 'ReplicationSlotCleanup'. > Fixed. > 5) > In case of error in subsequent iteration, slot_names allocated from > TopMemoryContext will be left unfreed? > I've changed the logic so that even on error, slot_names are freed. > 6) > + ListCell *lc; > + bool first_slot = true; > > Shall we move these two to concerned if-block: > if (slot_names != NIL) > Changed. > 7) > * The slot_persistence_pending flag is used by the pg_sync_replication_slots > * API to track if any slots could not be persisted and need to be retried. > > a) Instead of mentioning only about slot_persistence_pending argument > in concerned function's header, we shall define all the arguments. > > b) We can remove the 'flag' term from the comments as it is a > function-argument now. > Changed. > 8) > I think we should add briefly in the header of the file about the new > behaviour of API. > Added. Attaching patch v12 addressing these comments. regards, Ajin Cherian Fujitsu Australia