Re: Improve pg_sync_replication_slots() to wait for primary to advance
shveta malik <shveta.malik@gmail.com>
From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>,
Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>,
shveta malik <shveta.malik@gmail.com>
Date: 2025-08-19T09:42:05Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
On Tue, Aug 19, 2025 at 10:55 AM Ajin Cherian <itsajin@gmail.com> wrote: > > Attaching patch v7 addressing all the above comments. > Thank You for the patches. Please find a few comments: 1) We are not resetting 'slot_persistence_pending' to false anywhere. So once it hits the flow which sets it to true, it will never become false even if remote-slot catches up in subsequent cycles, resulting in a hang of the API. We shall reset it before starting a new iteration in SyncReplicationSlots(). 2) We need to clean 'slot_persistence_pending' in reset_syncing_flag() as well which is called at the end of API or in failure of API. Even though the slot sync worker is not using it, we should clean it up in slotsync_worker_onexit() as well. 3) + /* slot has been persisted, no need to retry */ + SlotSyncCtx->slot_persistence_pending |= false; + This will not be needed once we reset this flag before each iteration in SyncReplicationSlots() 4) -synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) +synchronize_one_slot(WalReceiverConn * wrconn, RemoteSlot * remote_slot, + Oid remote_dbid) wrconn not used anywhere. 5) + bool is_refresh = (target_slot_list!= NIL); is_refresh is not needed. We can simply check if target_slot_list!=NIL, then append it to cmd. 6) * If remote_slot_list is NIL, fetches all failover logical slots from the * primary server. If remote_slot_list is provided, refreshes only those * specific slots with current values from the primary server. The usage of the word 'refreshing' is confusing. Since we are allocating a new remote-list everytime (instead of reusing or refreshing previous one), we can simply say: ------ Fetches the failover logical slots info from the primary server If target_slot_list is NIL, fetches all failover logical slots from the primary server, otherwise fetches only the ones mentioned in target_slot_list ------ The 'Parameters:' can also be adjusted accordingly. 7) * Returns a list of RemoteSlot structures. If refreshing and the query fails, * returns the original list. Slots that no longer exist on the primary will * be removed from the list. This can be removed. 8) - * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the - * slot is valid, that means we have fetched the remote_slot in its - * RS_EPHEMERAL state. In such a case, don't sync it; we can always - * sync it in the next sync cycle when the remote_slot is persisted - * and has valid lsn(s) and xmin values. - * - * XXX: In future, if we plan to expose 'slot->data.persistency' in - * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL - * slots in the first place. + * Apply ephemeral slot filtering. Skip slots that are in RS_EPHEMERAL + * state (invalid LSNs/xmin but not explicitly invalidated). We can retain the original comment. 9) Apart from above, there are many changes (alignement, comments etc) which are not related to this particular improvement. We can get rid of those changes. The patch should have the changes pertaining to current improvement alone. thanks Shveta