Re: Improve pg_sync_replication_slots() to wait for primary to advance

Ajin Cherian <itsajin@gmail.com>

From: Ajin Cherian <itsajin@gmail.com>
To: shveta malik <shveta.malik@gmail.com>
Cc: Ashutosh Sharma <ashu.coek88@gmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-09-22T10:51:14Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enhance slot synchronization API to respect promotion signal.

  2. Fix inconsistent elevel in pg_sync_replication_slots() retry logic.

  3. Refactor slot synchronization logic in slotsync.c.

  4. Fix intermittent BF failure in 040_standby_failover_slots_sync.

  5. Add retry logic to pg_sync_replication_slots().

  6. Fix LOCK_TIMEOUT handling in slotsync worker.

  7. Add slotsync skip statistics.

Attachments

On Tue, Sep 16, 2025 at 4:23 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Sep 15, 2025 at 6:17 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> Thank You for the patch. Please find a few comments:
>
> 1)
> + bool slot_persistence_pending = false;
>
> We can move this declaration outside of the loop. And I think we don't
> need to initialize as we are resetting it to false before each
> iteration.
>

Fixed.

> 2)
>
> + /* Switch to long-lived TopMemoryContext to store slot names */
> + oldcontext = MemoryContextSwitchTo(TopMemoryContext);
> +
> + /* Extract slot names from the remote slots */
> + slot_names = extract_slot_names(remote_slots);
> +
> + MemoryContextSwitchTo(oldcontext);
>
> I think it will be better if we move 'MemoryContextSwitchTo' calls
> inside extract_slot_names() itself. The entire logic related to
> 'slot_names' will then be consolidated in one place
>

Changed,

> 3)
> + * The slot_persistence_pending flag is used by the pg_sync_replication_slots
> + * API to track if any slots could not be persisted and need to be retried.
>
> Can we change it to below. We can have it started in a new line after
> a blank line (see how remote_slot_precedes, found_consistent_snapshot
> are defined)
>
> *slot_persistence_pending is set to true if any of the slots fail to
> persist. It is utilized by the pg_sync_replication_slots() API.
>
> Please change it in both synchronize_one_slot() and
> update_and_persist_local_synced_slot()
>

Changed.

> 4)
> a)
> + Update the
> + * slot_persistence_pending flag, so the API can retry.
>   */
>
> b)
>  /* update the flag, so that the API can retry */
>
> It will be good if we can remove 'flag' usage from both occurrences in
> update_and_persist_local_synced_slot().
>

Changed.

> 5)
> Similar to ProcessSlotSyncInterrupts() for worker, shall we have one
> such function for API which can have all 3 things:
>
> {
> /*
>  * If we've been promoted, then no point
>  * continuing.
>  */
> if (SlotSyncCtx->stopSignaled)
> {
>    ereport(ERROR,
>   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>   errmsg("exiting from slot synchronization as"
>   " promotion is triggered")));
> }
>
> CHECK_FOR_INTERRUPTS();
>
> if (ConfigReloadPending)
>     slotsync_api_reread_config();
> }
>
> And similar to the worker case, we can have it checked in the
> beginning of the loop. Thoughts?
>

Changed it and added a function - ProcessSlotSyncAPIChanges()

Created a patch v13 with these changes.

regards,
Ajin Cherian
Fujitsu Australia