Thread

  1. Re: Improve pg_sync_replication_slots() to wait for primary to advance

    Japin Li <japinli@hotmail.com> — 2025-11-28T06:03:44Z

    On Fri, 28 Nov 2025 at 15:46, Ajin Cherian <itsajin@gmail.com> wrote:
    > On Wed, Nov 26, 2025 at 7:42 PM shveta malik <shveta.malik@gmail.com>
    > wrote:
    >>
    >>  A few comments:
    >>
    >> 1)
    >> +/*
    >> + * Structure holding parameters that need to be freed on error in
    >> + * pg_sync_replication_slots()
    >> + */
    >> +typedef struct SlotSyncApiFailureParams
    >> +{
    >> + WalReceiverConn *wrconn;
    >> + List *slot_names;
    >> +} SlotSyncApiFailureParams;
    >> +
    >>
    >> We can get rid of it now as we do not use it.
    >>
    >
    > Removed.
    >
    >> 2)
    >> ProcessSlotSyncInterrupts():
    >>
    >> + if (!AmLogicalSlotSyncWorkerProcess())
    >> + ereport(ERROR,
    >> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >> + errmsg("cannot continue replication slots synchronization"
    >> +    " as standby promotion is triggered"));
    >> + else
    >> + {
    >>
    >> Can we please reverse the if-else i.e. first worker and then API.
    >> Negated if-condition can be avoided in this case.
    >>
    >
    > Changed.
    >
    >> 3)
    >>
    >> slotsync_reread_config():
    >> + /* Worker-specific check for sync_replication_slots change */
    >>
    >> Now since we check for both API and worker, this comment is not
    >> needed.
    >>
    >
    > Removed.
    >
    >> 4)
    >> - ereport(LOG,
    >> - errmsg("replication slot synchronization worker will restart
    >> because
    >> of a parameter change"));
    >>
    >> - /*
    >> - * Reset the last-start time for this worker so that the postmaster
    >> - * can restart it without waiting for
    >> SLOTSYNC_RESTART_INTERVAL_SEC.
    >> - */
    >> - SlotSyncCtx->last_start_time = 0;
    >> + ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
    >> + errmsg("replication slot synchronization will stop because of a
    >> parameter change"));
    >>
    >> Here, we should retain the same old message for worker i.e. 'worker
    >> will restart..'. instead of 'synchronization will stop'. I find the
    >> old message better in this case.
    >>
    >> 5)
    >> slotsync_reread_config() is slightly difficult to follow.
    >> I think in the case of API, we can display a common error message
    >> instead of 2 different messages for 'sync_replication_slot change'
    >> and
    >> the rest of the parameters. We can mark if any of the parameters
    >> changed in both 'if' blocks and if the current process has not
    >> exited,
    >> then at the end based on 'parameter-changed', we can deal with API
    >> by
    >> giving a common message. Something like:
    >>
    >> /*
    >>  * If we have reached here with a parameter change, we must be
    >> running
    >>  * in SQL function, emit error in such a case.
    >>  */
    >> if (parameter_changed (new variable))
    >> {
    >> Assert (!AmLogicalSlotSyncWorkerProcess);
    >> ereport(ERROR,
    >> errmsg("replication slot synchronization will stop because of a
    >> parameter change"));
    >> }
    >>
    >
    > Fixed as above.
    >
    > I've addressed the above comments as well as rebased the patch based
    > on changes in commit 76b7872 in patch v26
    >
    
    1.
    Initialize slot_persistence_pending to false (to avoid uninitialized values, or
    initialize to true by mistaken) in update_and_persist_local_synced_slot(). This
    aligns with the handling of found_consistent_snapshot and remote_slot_precedes
    in update_local_synced_slot().
    
    diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
    index 20eada3393..c55ba11f17 100644
    --- a/src/backend/replication/logical/slotsync.c
    +++ b/src/backend/replication/logical/slotsync.c
    @@ -617,6 +617,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
            bool            found_consistent_snapshot = false;
            bool            remote_slot_precedes = false;
    
    +       if (slot_persistence_pending)
    +               *slot_persistence_pending = false;
    +
            /* Slotsync skip stats are handled in function update_local_synced_slot() */
            (void) update_local_synced_slot(remote_slot, remote_dbid,
                                                                            &found_consistent_snapshot,
    
    2.
    This change seems unnecessary。
    
     static void
    -slotsync_reread_config(void)
    +slotsync_reread_config()
    
     static void
    -ProcessSlotSyncInterrupts(void)
    +ProcessSlotSyncInterrupts()
    
    3.
    Since we are already caching the result of AmLogicalSlotSyncWorkerProcess() in
    a local worker variable, how about applying this replacement:
    s/if (AmLogicalSlotSyncWorkerProcess())/if (worker)/g?
    
    +	bool		worker = AmLogicalSlotSyncWorkerProcess();
    +	bool		parameter_changed = false;
     
    -	Assert(sync_replication_slots);
    +	if (AmLogicalSlotSyncWorkerProcess())
    +		Assert(sync_replication_slots);
    
    -- 
    Regards,
    Japin Li
    ChengDu WenWu Information Technology Co., Ltd.