Re: Improve pg_sync_replication_slots() to wait for primary to advance
Ajin Cherian <itsajin@gmail.com>
From: Ajin Cherian <itsajin@gmail.com>
To: shveta malik <shveta.malik@gmail.com>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>,
Japin Li <japinli@hotmail.com>, Ashutosh Sharma <ashu.coek88@gmail.com>,
Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-12-03T03:21:35Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
Attachments
- v28-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch (application/octet-stream) patch v28-0001
On Tue, Dec 2, 2025 at 8:35 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 1:58 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> >
> > Attached patch v27 addresses the above comments.
> >
>
> Thanks for the patch. Please find a few comments:
>
> 1)
> + /* The worker pid must not be already assigned in SlotSyncCtx */
> + Assert(SlotSyncCtx->pid == InvalidPid);
> +
>
> We can mention just 'pid' here instead of 'worker pid'
>
Changed.
> 2)
> + /*
> + * The syscache access in fetch_or_refresh_remote_slots() needs a
> + * transaction env.
> + */
> fetch_or_refresh_remote_slots --> fetch_remote_slots
>
> 3)
> SyncReplicationSlots(WalReceiverConn *wrconn)
> {
> +
>
> We can get rid of this blank line at the start of the function.
>
Fixed.
> 4)
> /*
> * Shut down the slot sync worker.
> *
> - * This function sends signal to shutdown slot sync worker, if required. It
> + * This function sends signal to shutdown the slot sync process, if
> required. It
> * also waits till the slot sync worker has exited or
> * pg_sync_replication_slots() has finished.
> */
> Shall we change comment to something like (rephrase if required):
>
> Shut down the slot synchronization.
> This function wakes up the slot sync process (either worker or backend
> running SQL function) and sets stopSignaled=true
> so that worker can exit or SQL function pg_sync_replication_slots()
> can finish. It also waits till the slot sync worker has exited or
> pg_sync_replication_slots() has finished.
>
Changed.
> 5)
> We should change the comment atop 'SlotSyncCtxStruct' as well to
> mention that this pid is either the slot sync worker's pid or
> backend's pid running the SQL function. It is needed by the startup
> process to wake these up, so that they can stop synchronization on
> seeing stopSignaled. <please rephrase as needed>
>
Changed.
> 6)
> + ereport(LOG,
> + errmsg("replication slot synchronization worker is shutting down on
> receiving SIGUSR1"));
>
> SIGUSR1 was actually just a wake-up signal. We may change the comment to:
> replication slot synchronization worker is shutting down as promotion
> is triggered.
>
Changed.
> 7)
> update_synced_slots_inactive_since:
> /* The slot sync worker or SQL function mustn't be running by now */
> - Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
> + Assert(!SlotSyncCtx->syncing);
>
> Regarding this, I see that 'update_synced_slots_inactive_since' is
> only called when we are sure that 'syncing' is false. So shouldn't pid
> be also Invalid by that time? Even if it was backend's pid to start
> with, but since backend has stopped syncing (finished or error-ed
> out),
> pid should be reset to Invalid in such a case. And this Assert need
> not to be changed.
>
Fixed.
> 8)
>
> + if (sync_process_pid != InvalidPid)
> + kill(sync_process_pid, SIGUSR1);
>
> We can write comments to say wake-up slot sync process.
>
Added comments.
Attaching patch v28 addressing these comments.
regards,
Ajin Cherian
Fujitsu Australia