Re: Improve pg_sync_replication_slots() to wait for primary to advance
Amit Kapila <amit.kapila16@gmail.com>
From: Amit Kapila <amit.kapila16@gmail.com>
To: shveta malik <shveta.malik@gmail.com>
Cc: Dilip Kumar <dilipbalaut@gmail.com>, Ajin Cherian <itsajin@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-07-21T05:55:42Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
On Mon, Jul 21, 2025 at 10:08 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Sat, Jul 19, 2025 at 5:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jul 18, 2025 at 11:31 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Jul 18, 2025 at 11:25 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > Okay. I see your point. Yes, it was non-blocking earlier but it was > > > > not giving ERROR, it was just dumping in logilfe that primary is > > > > behind and thus slot-sync could not be done. > > > > > > > > If we continue using the non-blocking mode, there’s a risk that the > > > > API may never successfully sync the slots. This is because it > > > > eventually drops the temporary slot on exit, and when it tries to > > > > create a new one later on subsequent call, it’s likely that the new > > > > slot will again be ahead of the primary. This may happen if we have > > > > continuous ongoing writes on the primary and the logical slot is not > > > > being consumed at the same pace. > > > > > > > > My preference would be to avoid including such an option as it is > > > > confusing. With such an option in place, users may think that > > > > slot-sync is completed while that may not be the case. > > > > > > Fair enough > > > > > > > I think if we want we may return bool and return false when sync is > > not complete say due to promotion or other reason like timeout. > > However, at this stage it is not very clear whether it will be useful > > to provide additional timeout parameter. But we can consider retruning > > true/false depending on whether we are successful in syncing the slots > > or not. > > I am not very sure if in the current scenario, such a return-value > will have any value addition. Since this function will be waiting > indefinitely until all the slots are synced, it is supposed to return > true in such normal scenarios. If it is interrupted by promotion or > user cancels it manually, then it is supposed to return false. But in > those cases, a more helpful approach would be to log a clear WARNING > or ERROR message like "sync interrupted by promotion" (or similar > reasons), rather than relying on a return value. In future, if we plan > to add a timeout-parameter, then this return value makes more sense as > in normal scenarios as well, as it can easily return false if the > timeout value is short or the number of slots are huge or are stuck > waiting on primary. > > Additionally, if we do return a value, there may be an expectation > that the API should also provide details on the list of slots that > couldn't be synced. That could introduce unnecessary complexity at > this stage. We can avoid it for now and consider adding such > enhancements later if we receive relevant customer feedback. > makes sense. > Please > note that our recommended approach for syncing slots still remains the > 'slot sync worker' method. > Right. -- With Regards, Amit Kapila.