Re: Improve pg_sync_replication_slots() to wait for primary to advance

shveta malik <shveta.malik@gmail.com>

From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: Japin Li <japinli@hotmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Ashutosh Sharma <ashu.coek88@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>, shveta malik <shveta.malik@gmail.com>
Date: 2025-10-31T05:51:16Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enhance slot synchronization API to respect promotion signal.

  2. Fix inconsistent elevel in pg_sync_replication_slots() retry logic.

  3. Refactor slot synchronization logic in slotsync.c.

  4. Fix intermittent BF failure in 040_standby_failover_slots_sync.

  5. Add retry logic to pg_sync_replication_slots().

  6. Fix LOCK_TIMEOUT handling in slotsync worker.

  7. Add slotsync skip statistics.

On Fri, Oct 31, 2025 at 11:04 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Oct 30, 2025 at 3:48 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> >
> > Thanks for your review, Japin. Here's patch v20 addressing the comments.
> >
>
> Thank You for the patch. Please find a few comment son test:
>
>
> 1)
> +# until the slot becomes sync-ready (when the standby catches up to the
> +# slot's restart_lsn).
>
> I think it should be 'when the primary server catches up' or 'when the
> remote slot catches up with the locally reserved position.'
>
> 2)
> +# Attempt to synchronize slots using API. This will initially fail because
> +# the slot is not yet sync-ready (standby hasn't caught up to slot's
> restart_lsn),
> +# but the API will wait and retry. Call the API in a background process.
>
> a)
> 'This will initially fail ' seems like the API will give an error,
> which is not the case
>
> b) 'standby hasn't caught up to slot's restart_lsn' is not correct.
>
> We can rephrase to:
> # Attempt to synchronize slots using the API. The API will continue
> retrying synchronization until the remote slot catches up with the
> locally reserved position.
>
> 3)
> +# Enable the Subscription, so that the slot catches up
>
> slot --> remote slot
>
> 4)
> +# Create xl_running_xacts records on the primary for which the
> standby is waiting
>
> Shall we rephrase to below or anything better if you have?:
> Create xl_running_xacts on the primary to speed up restart_lsn advancement.
>
> 5)
> +# Confirm that the logical failover slot is created on the standby and is
> +# flagged as 'synced'
>
> Suggestion:
> Verify that the logical failover slot is created on the standby,
> marked as 'synced', and persisted.
>
> (It is important to mention persisted because even temporary slot is
> marked as synced)
>

Shall we remove this change as it does not belong to the current patch
directly? I think it was a suggestion earlier, but we shall remove it.

6)
-# Confirm the synced slot 'lsub1_slot' is retained on the new primary
+# Confirm that the synced slots 'lsub1_slot' and 'snap_test_slot' are
retained on the new primary
 is( $standby1->safe_psql(
  'postgres',
  q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
+

thanks
Shveta