Re: Improve pg_sync_replication_slots() to wait for primary to advance
shveta malik <shveta.malik@gmail.com>
From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>,
Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>,
shveta malik <shveta.malik@gmail.com>
Date: 2025-09-04T06:35:16Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
On Wed, Sep 3, 2025 at 3:19 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Wed, Sep 3, 2025 at 6:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
> > >
> > > Attaching v10 with the above changes.
> > >
> >
> > The patch does not apply on HEAD. Can you please rebase?
>
> Rebased and made a small change as well to use TopMemoryContext rather
> than create a new context for slot_list.
>
Thanks for the patch. Please find a few comments:
1)
/* Clean up slot_names if allocated in TopMemoryContext */
if (slot_names)
{
oldcontext = MemoryContextSwitchTo(TopMemoryContext);
list_free_deep(slot_names);
MemoryContextSwitchTo(oldcontext);
}
I think we can free slot_names without switching the context. Can you
please check this?
2)
We should add a comment for:
a) why we are using the slot-names from the first cycle instead of
fetching all failover slots in each cycle.
b) why are we relocating remote_slot list everytime.
3)
@@ -1130,7 +1180,7 @@ slotsync_reread_config(void)
- Assert(sync_replication_slots);
+ Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
Do we still need this change after slotsync_api_reread_config?
4)
+static void ProcessSlotSyncInterrupts(void);
This is not needed.
5)
+ /* update flag, so that we retry */
+ slot_persistence_pending = true;
Can we tweak it to: 'Update the flag so that the API can retry'
6)
SyncReplicationSlots():
+ /* Free the current remote_slots list */
+ if (remote_slots)
+ list_free_deep(remote_slots);
Do we need a 'remote_slots' check, won't it manage it internally? We
don't have it in ReplSlotSyncWorkerMain().
7)
slotsync_api_reread_config
+ ereport(ERROR,
+ (errcode(ERRCODE_CONFIG_FILE_ERROR),
+ errmsg("cannot continue slot synchronization due to parameter changes"),
+ errdetail("Critical replication parameters (primary_conninfo,
primary_slot_name, or hot_standby_feedback) have changed since
pg_sync_replication_slots() started."),
+ errhint("Retry pg_sync_replication_slots() to use the updated
configuration.")));
I am unsure if we need to mention '(primary_conninfo,
primary_slot_name, or hot_standby_feedback)', but would like to know
what others think.
thanks
Shveta