Re: Improve pg_sync_replication_slots() to wait for primary to advance
shveta malik <shveta.malik@gmail.com>
From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>,
Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>,
shveta malik <shveta.malik@gmail.com>
Date: 2025-08-22T05:44:45Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
On Wed, Aug 20, 2025 at 10:53 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
> I've removed them.
> Attaching patch v8 addressing the above comments.
>
Thanks for the patch. Please find a few comments:
1)
When the API is in progress, and meanwhile in another session we turn
off hot_standby_feedback, the API session terminates abnormally.
postgres=# SELECT pg_sync_replication_slots();
server closed the connection unexpectedly
It seems slotsync_reread_config() is not adjusted for API. It does
proc_exit assuming it is a worker process.
2)
slotsync_worker_onexit():
SlotSyncCtx->slot_persistence_pending = false;
/*
* If syncing_slots is true, it indicates that the process errored out
* without resetting the flag. So, we need to clean up shared memory and
* reset the flag here.
*/
if (syncing_slots)
{
SlotSyncCtx->syncing = false;
syncing_slots = false;
}
Shall we reset slot_persistence_pending inside 'if (syncing_slots)'?
slot_persistence_pending can not be true without syncing_slots being
true.
3)
reset_syncing_flag():
SpinLockAcquire(&SlotSyncCtx->mutex);
SlotSyncCtx->syncing = false;
+ SlotSyncCtx->slot_persistence_pending = false;
SpinLockRelease(&SlotSyncCtx->mutex);
Here we are changing slot_persistence_pending under mutex, while at
other places, it is not protected by mutex. Is it intentional here?
4)
On rethinking, we maintain anything in shared memory if it has to be
shared between a few processes. 'slot_persistence_pending' OTOH is
required to be set and accessed by only one process at a time. Shall
we move it out of SlotSyncCtxStruct and keep it static similar to
'syncing_slots'? Rest of the setting, resetting flow remains the same.
What do you think?
5)
/* Build the query based on whether we're fetching all or refreshing
specific slots */
Perhaps we can shift this comment to where we actually append
target_slot_list. Better to have it something like:
'If target_slot_list is provided, construct the query only to fetch given slots'
thanks
Shveta