Re: Improve pg_sync_replication_slots() to wait for primary to advance
shveta malik <shveta.malik@gmail.com>
From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>,
Japin Li <japinli@hotmail.com>, Ashutosh Sharma <ashu.coek88@gmail.com>,
Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>,
shveta malik <shveta.malik@gmail.com>
Date: 2025-12-02T09:35:26Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
On Tue, Dec 2, 2025 at 1:58 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
>
> Attached patch v27 addresses the above comments.
>
Thanks for the patch. Please find a few comments:
1)
+ /* The worker pid must not be already assigned in SlotSyncCtx */
+ Assert(SlotSyncCtx->pid == InvalidPid);
+
We can mention just 'pid' here instead of 'worker pid'
2)
+ /*
+ * The syscache access in fetch_or_refresh_remote_slots() needs a
+ * transaction env.
+ */
fetch_or_refresh_remote_slots --> fetch_remote_slots
3)
SyncReplicationSlots(WalReceiverConn *wrconn)
{
+
We can get rid of this blank line at the start of the function.
4)
/*
* Shut down the slot sync worker.
*
- * This function sends signal to shutdown slot sync worker, if required. It
+ * This function sends signal to shutdown the slot sync process, if
required. It
* also waits till the slot sync worker has exited or
* pg_sync_replication_slots() has finished.
*/
Shall we change comment to something like (rephrase if required):
Shut down the slot synchronization.
This function wakes up the slot sync process (either worker or backend
running SQL function) and sets stopSignaled=true
so that worker can exit or SQL function pg_sync_replication_slots()
can finish. It also waits till the slot sync worker has exited or
pg_sync_replication_slots() has finished.
5)
We should change the comment atop 'SlotSyncCtxStruct' as well to
mention that this pid is either the slot sync worker's pid or
backend's pid running the SQL function. It is needed by the startup
process to wake these up, so that they can stop synchronization on
seeing stopSignaled. <please rephrase as needed>
6)
+ ereport(LOG,
+ errmsg("replication slot synchronization worker is shutting down on
receiving SIGUSR1"));
SIGUSR1 was actually just a wake-up signal. We may change the comment to:
replication slot synchronization worker is shutting down as promotion
is triggered.
7)
update_synced_slots_inactive_since:
/* The slot sync worker or SQL function mustn't be running by now */
- Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
+ Assert(!SlotSyncCtx->syncing);
Regarding this, I see that 'update_synced_slots_inactive_since' is
only called when we are sure that 'syncing' is false. So shouldn't pid
be also Invalid by that time? Even if it was backend's pid to start
with, but since backend has stopped syncing (finished or error-ed
out),
pid should be reset to Invalid in such a case. And this Assert need
not to be changed.
8)
+ if (sync_process_pid != InvalidPid)
+ kill(sync_process_pid, SIGUSR1);
We can write comments to say wake-up slot sync process.
thanks
Shveta