Re: Improve pg_sync_replication_slots() to wait for primary to advance
Ajin Cherian <itsajin@gmail.com>
From: Ajin Cherian <itsajin@gmail.com>
To: shveta malik <shveta.malik@gmail.com>
Cc: PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-07-31T09:40:57Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
Attachments
- v3-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patch (application/octet-stream) patch v3-0001
On Thu, Jul 17, 2025 at 2:04 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Jul 16, 2025 at 3:47 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> > > I am not able to apply the patch to the latest head or even to a week
> > > back version. Can you please check and rebase?
> > >
> > > thanks
> > > Shveta
> >
> > Rebased.
> >
>
> Thanks. Please find a few comments:
>
>
> 1)
> /* Any slot with NULL in these fields should not have made it this far */
>
> It is good to get rid of the case where we had checks for NULL
> confirmed_lsn and catalog_xmin (i.e. when slot was in RS_EPHEMERAL
> state), as that has already been checked by synchronize_slots() and
> such a slot will not even reach wait_for_primary_slot_catchup(). But a
> slot can still be invalidated on primary anytime, and thus during this
> wait, we should check for primary's invalidation as we were doing in
> v1.
>
I've added back the check for invalidated slots.
> 2)
> + * If in SQL API synchronization, and we've been promoted, then no point
>
> extra space before promoted.
Fixed.
>
> 3)
>
> + if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered())
>
> We don't need 'AmLogicalSlotSyncWorkerProcess' as that is already
> checked at the beginning of this function.
>
Fixed.
> 4)
> + ereport(WARNING,
> + errmsg("aborting sync for slot \"%s\"",
> + remote_slot->name),
> + errdetail("Promotion occurred before this slot was fully"
> + " synchronized."));
> + pfree(cmd.data);
> +
> + return false;
>
> a) Please add an error-code.
>
> b) Shall we change msg to
>
> errmsg("aborting sync for slot \"%s\"",
> remote_slot->name),
> errhint("%s cannot be executed once promotion is
> triggered.",
>
> "pg_sync_replication_slots()")));
>
>
Since there is already an error return in the start if promotion is
triggered, I've kept the same error code and message here as well for
consistency.
>
> 5)
> Instead of using PromoteIsTriggered, shall we rely on
> 'SlotSyncCtx->stopSignaled' as we do when we start this API.
>
Fixed.
> 6)
> In logicaldecoding.sgml, we can get rid of "Additionally, enabling
> sync_replication_slots on the standby is required" to make it same as
> what we had prior to the patch I pointed earlier.
>
> Or better we can refine it to below. Thoughts?
>
> The logical replication slots on the primary can be enabled for
> synchronization to the hot standby by using the failover parameter of
> pg_create_logical_replication_slot, or by using the failover option of
> CREATE SUBSCRIPTION during slot creation. After that, synchronization
> can be performed either manually by calling pg_sync_replication_slots
> on the standby, or automatically by enabling sync_replication_slots on
> the standby. When sync_replication_slots is enabled, the failover
> slots are periodically synchronized by the slot sync worker. For the
> synchronization to work, .....
>
Updated as above.
Patch v3 attached.
Regards,
Ajin Cherian
Fujitsu Australia