Re: Improve pg_sync_replication_slots() to wait for primary to advance

Amit Kapila <amit.kapila16@gmail.com>

From: Amit Kapila <amit.kapila16@gmail.com>
To: "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>
Cc: shveta malik <shveta.malik@gmail.com>, Ajin Cherian <itsajin@gmail.com>, Yilin Zhang <jiezhilove@126.com>, Chao Li <li.evan.chao@gmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Japin Li <japinli@hotmail.com>, Ashutosh Sharma <ashu.coek88@gmail.com>, PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
Date: 2025-12-18T11:39:37Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enhance slot synchronization API to respect promotion signal.

  2. Fix inconsistent elevel in pg_sync_replication_slots() retry logic.

  3. Refactor slot synchronization logic in slotsync.c.

  4. Fix intermittent BF failure in 040_standby_failover_slots_sync.

  5. Add retry logic to pg_sync_replication_slots().

  6. Fix LOCK_TIMEOUT handling in slotsync worker.

  7. Add slotsync skip statistics.

On Wed, Dec 17, 2025 at 3:58 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Here is a small patch to fix it.
>

Thanks, I've pushed the patch. BTW, looking at the code of slot_sync
API code path, I could think of the following improvements.

*
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);

/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
*/
ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

Can we change this ERROR to LOG even for API as now the API also
retires to sync the slots during initial sync?

* The use of the slot_persistence_pending flag in the internal APIs
seems to be the reverse of what it should be. I mean to say that
initially it should be true and when we actually persist the slot then
we can set it to false.

* We can retry to sync all the slots present in the primary at the
start of API, not only temporary slots. If we do this then the
previous point may not be required. Also, please mention something
like: "It retries cyclically until all the failover slots that existed
on primary at the start of the function call are synchronized." in the
function description [1] as well.

[1] - https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION
-- 
With Regards,
Amit Kapila.