Re: Improve pg_sync_replication_slots() to wait for primary to advance
shveta malik <shveta.malik@gmail.com>
From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Amit Kapila <amit.kapila16@gmail.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Japin Li <japinli@hotmail.com>, Ashutosh Sharma <ashu.coek88@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>,
shveta malik <shveta.malik@gmail.com>
Date: 2025-12-04T06:03:50Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
On Thu, Dec 4, 2025 at 10:51 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 10:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 8:51 AM Ajin Cherian <itsajin@gmail.com> wrote:
> > >
> > > Attaching patch v28 addressing these comments.
> > >
> >
> > Can we extract the part of the patch that handles SIGUSR1 signal
> > separately as a first patch and the remaining as a second patch?
> > Please do mention the reason in the commit message as to why we are
> > changing the signal for SIGINT to SIGUSR1.
> >
>
> I have extracted out the SIGUSR1 signal handling changes separately
> into a patch and sharing. I will share the next patch later.
> Let me know if there are any comments for this patch.
>
I have just 2 trivial comments for v29-001:
1)
- * receives a SIGINT from the startup process, or when there is an error.
+ * receives a SIGUSR1 from the startup process, or when there is an error.
In above we should mention stopSignaled rather than SIGUSR1, as
SIGUSR1 is just a wakeup signal and not termination signal.
2)
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot continue replication slot synchronization"
+ " as standby promotion is triggered"));
Please mention that it is SQL-function in the comment for else-block.
~~
I tested the touched scenarios and here are the LOGs:
a)
When promotion is ongoing and the startup process has terminated
slot-sync worker but if the postmaster has not noticed that, it may
end up starting slotsync worker again. For that scenario, we get
these:
11:03:19.712 IST [151559] LOG: replication slot synchronization
worker is shutting down as promotion is triggered
11:03:19.726 IST [151629] LOG: slot sync worker started
11:03:19.795 IST [151629] LOG: replication slot synchronization
worker is shutting down as promotion is triggered
b)
On promotion, API gets this (originating from ProcessSlotSyncInterrupts now):
postgres=# SELECT pg_sync_replication_slots();
ERROR: cannot continue replication slot synchronization as standby
promotion is triggered
c)
If any parameter is changed between ValidateSlotSyncParams() and
ProcessSlotSyncInterrupts() for API, we get this:
postgres=# SELECT pg_sync_replication_slots();
ERROR: replication slot synchronization will stop because of a parameter change
--on re-run (originating from ValidateSlotSyncParams())
postgres=# SELECT pg_sync_replication_slots();
ERROR: replication slot synchronization requires
"hot_standby_feedback" to be enabled
~~
The tested scenarios' behaviour looks good to me.
thanks
Shveta