Re: Improve pg_sync_replication_slots() to wait for primary to advance
shveta malik <shveta.malik@gmail.com>
From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Ashutosh Sharma <ashu.coek88@gmail.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>,
shveta malik <shveta.malik@gmail.com>
Date: 2025-09-26T10:14:36Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
On Wed, Sep 24, 2025 at 5:35 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Tue, Sep 23, 2025 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Sep 22, 2025 at 4:21 PM Ajin Cherian <itsajin@gmail.com> wrote:
> > >
> > >
> > > Created a patch v13 with these changes.
> > >
> >
> > Please find a few comments:
> >
> > 1)
> > + /* update the failure structure so that it can be freed on error */
> > + fparams.slot_names = slot_names;
> > +
> >
> > Since slot_names is assigned only once, we can make the above
> > assignment as well only once, inside the if-block where we initialize
> > slot_names.
> >
>
> Changed.
>
> > 2)
> > extract_slot_names():
> >
> > + foreach(lc, remote_slots)
> > + {
> > + RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
> > + char *slot_name;
> > +
> > + /* Switch to long-lived TopMemoryContext to store slot names */
> > + oldcontext = MemoryContextSwitchTo(TopMemoryContext);
> > +
> > + slot_name = pstrdup(remote_slot->name);
> > + slot_names = lappend(slot_names, slot_name);
> > +
> > + MemoryContextSwitchTo(oldcontext);
> > + }
> >
> > It will be better to move 'MemoryContextSwitchTo' calls outside of the
> > loop. No need to switch the context for each slot.
> >
>
> Changed.
>
> > 3)
> > ProcessSlotSyncAPIChanges() gives a feeling that it is actually
> > processing API changes where instead it is processing interrupts or
> > config changes. Can we please rename to ProcessSlotSyncAPIInterrupts()
> >
>
> Changed.
>
> > 4)
> > I prefer version 11's slotsync_api_reread_config() over current
> > slotsync_api_config_changed(). There, even error was taken care of
> > inside the function, which to me looked better and similar to how
> > slotsync worker deals with it.
> >
>
> Changed.
>
> > I have made some comment changes, attached the patch. Please include
> > it if you find it okay.
>
> Incorporated.
>
>
>
> On Tue, Sep 23, 2025 at 4:42 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >
> > Tested the patch, few more suggestions
> >
> > 5)
> > Currently the error message is:
> >
> > postgres=# SELECT pg_sync_replication_slots();
> > ERROR: cannot continue slot synchronization due to parameter changes
> > DETAIL: Critical replication parameters (primary_conninfo,
> > primary_slot_name, or hot_standby_feedback) have changed since
> > pg_sync_replication_slots() started.
> > HINT: Retry pg_sync_replication_slots() to use the updated configuration.
> >
> > a)
> > To be consistent with other error-messages, can we change ERROR msg
> > to: 'cannot continue replication slots synchronization due to
> > parameter changes'
> >
>
> Changed.
It seems it is missed to be changed perhaps due to bringing back
previous implementation of slotsync_api_reread_config()
>
> > b)
> > There is double space in DETAIL msg: "have changed since"
> >
> > Will it be better to shorten the DETAIL as: 'One or more of
> > primary_conninfo, primary_slot_name, or hot_standby_feedback were
> > modified.'
> >
>
> Changed.
>
This too is missed.
> > 6)
> > postgres=# SELECT pg_sync_replication_slots();
> > ERROR: exiting from slot synchronization as promotion is triggered
> >
> > Shall we rephrase it similar to the previous message: 'cannot continue
> > replication slots synchronization as standby promotion is triggered'
> >
> Changed.
>
> Attaching patch v14 incorporating the above changes.
>
Few trivial comments:
1)
slotsync_api_reread_config():
+ * Returns true if conninfo, primary_slot_name or hot_standby_feedback changed.
This comment is no longer valid, we can remove it.
2)
/* We are done with sync, so reset sync flag */
reset_syncing_flag();
+
This extra blank line is not needed.
thanks
Shveta