Re: Improve pg_sync_replication_slots() to wait for primary to advance

Ajin Cherian <itsajin@gmail.com>

From: Ajin Cherian <itsajin@gmail.com>

To: shveta malik <shveta.malik@gmail.com>

Cc: Ashutosh Sharma <ashu.coek88@gmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>

Date: 2025-09-15T12:47:43Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited

Attachments

v12-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch (application/octet-stream) patch v12-0001

On Wed, Sep 10, 2025 at 2:45 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Sep 9, 2025 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> > Attached v11 patch addressing the above comments.
> >
>
> Please find a few comments:
>
> 1)
>
> + Retry is done after 2
> + * sec wait. Exits early if promotion is triggered or certain critical
>
> We can say: Retry is done after SLOTSYNC_API_NAPTIME_MS wait.
>

Changed.

> 2)
> + /*
> + * Fetch remote slot info for the given slot_names. If slot_names is NIL,
> + * fetch all failover-enabled slots. Note that we reuse slot_names from
> + * the previous iteration; re-fetching all failover slots each time could
> + * cause an endless loop.
> + */
>
> a)
> the previous iteration --> the first iteration.
>
> b) Also we can mention the reason why we take names from first
> iteration instead of going for pending ones alone, something like:
>
> Instead of reprocessing only the pending slots in each iteration, it's
> better to process all the slots received in the first iteration.
> This ensures that by the time we're done, all slots reflect the latest values.
>
> 3)
> + remote_slots = fetch_remote_slots(wrconn, slot_names);
> +
> +
> + /* Attempt to synchronize slots */
> + synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
>
> One extra blank line can be removed
>

Fixed.

> 4)
>
> + /* Clean up slot_names if allocated in TopMemoryContext */
> + if (slot_names)
> + list_free_deep(slot_names);
>
> Can we please move it before 'ReplicationSlotCleanup'.
>

Fixed.

> 5)
> In case of error in subsequent iteration, slot_names allocated from
> TopMemoryContext will be left unfreed?
>

I've changed the logic so that even on error, slot_names are freed.

> 6)
> + ListCell   *lc;
> + bool first_slot = true;
>
> Shall we move these two to concerned if-block:
> if (slot_names != NIL)
>

Changed.

> 7)
>  * The slot_persistence_pending flag is used by the pg_sync_replication_slots
>  * API to track if any slots could not be persisted and need to be retried.
>
> a) Instead of mentioning only about slot_persistence_pending argument
> in concerned function's header, we shall define all the arguments.
>
> b) We can remove the 'flag' term from the comments as it is a
> function-argument now.
>

Changed.

> 8)
> I think we should add briefly in the header of the file about the new
> behaviour of API.
>

Added.

Attaching patch v12 addressing these comments.

regards,
Ajin Cherian
Fujitsu Australia