Re: Improve pg_sync_replication_slots() to wait for primary to advance
Amit Kapila <amit.kapila16@gmail.com>
From: Amit Kapila <amit.kapila16@gmail.com>
To: shveta malik <shveta.malik@gmail.com>
Cc: Ajin Cherian <itsajin@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-08-04T06:00:49Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
On Fri, Aug 1, 2025 at 2:50 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> 5)
> I tried a test where there were 4 slots on the publisher, where one
> was getting used while the others were not. Initiated
> pg_sync_replication_slots on standby. Forced unused slots to be
> invalidated by setting idle_replication_slot_timeout=60 on primary,
> due to which API finished but gave a warning:
>
> postgres=# SELECT pg_sync_replication_slots();
> WARNING: aborting initial sync for slot "failover_slot"
> DETAIL: This slot was invalidated on the primary server.
> WARNING: aborting initial sync for slot "failover_slot2"
> DETAIL: This slot was invalidated on the primary server.
> WARNING: aborting initial sync for slot "failover_slot3"
> DETAIL: This slot was invalidated on the primary server.
> pg_sync_replication_slots
> ---------------------------
>
> (1 row)
>
> Do we need these warnings here? I think we can have it as a LOG rather
> than having it on console. Thoughts?
>
What is the behaviour of a slotsync worker in the same case? I don't
see any such WARNING messages in the code of slotsync worker, so why
do we want a different behaviour here?
Few other comments:
======================
1.
update_and_persist_local_synced_slot()
{
...
+ /*
+ * For SQL API synchronization, we wait for the remote slot to catch up
+ * here, since we can't assume the SQL API will be called again soon.
+ * We will retry the sync once the slot catches up.
+ *
+ * Note: This will return false if a promotion is triggered on the
+ * standby while waiting, in which case we stop syncing and drop the
+ * temporary slot.
+ */
+ if (!wait_for_primary_slot_catchup(wrconn, remote_slot))
+ return false;
Why is the wait added at this level? Shouldn't it be at API level aka
in SyncReplicationSlots() or pg_sync_replication_slots() similar to
what we do in ReplSlotSyncWorkerMain() for slotsync workers?
2.
REPLICATION_SLOTSYNC_MAIN "Waiting in main loop of slot sync worker."
...
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP "Waiting for the primary to catch-up."
Can't we reuse existing waitevent REPLICATION_SLOTSYNC_MAIN? We may
want to change the description. Is there a specific reason to add a
new wait_event for this API?
--
With Regards,
Amit Kapila.