Thread

  1. Re: Improve pg_sync_replication_slots() to wait for primary to advance

    Japin Li <japinli@hotmail.com> — 2025-10-31T05:01:51Z

    On Thu, 30 Oct 2025 at 19:15, Chao Li <li.evan.chao@gmail.com> wrote:
    > Hi Ajin,
    >
    > I have reviewed v20 and got a few comments:
    >
    >> On Oct 30, 2025, at 18:18, Ajin Cherian <itsajin@gmail.com> wrote:
    >> 
    >> <v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch>
    >
    > 1 - slotsync.c
    > ```
    > +		if (slot_names)
    > +			list_free_deep(slot_names);
    >  
    >  		/* Cleanup the synced temporary slots */
    >  		ReplicationSlotCleanup(true);
    > @@ -1762,5 +2026,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
    >  		/* We are done with sync, so reset sync flag */
    >  		reset_syncing_flag();
    >  	}
    > -	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
    > +	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
    > ```
    >
    > I am afraid there is a risk of double memory free. Slot_names has been assigned to fparams.slot_names within the  for loop, and it’s freed after the loop. If something gets wrong and slotsync_failure_callback() is called, the function will free fparams.slot_names again.
    >
    
    Agreed.
    
    Maybe we should set the fparams.slot_names to NIL immediately after freeing
    the memory.
    
    > 2 - slotsync.c
    > ```
    > +			/*
    > +			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
    > +			 * fetch all failover-enabled slots. Note that we reuse slot_names from
    > +			 * the first iteration; re-fetching all failover slots each time could
    > +			 * cause an endless loop. Instead of reprocessing only the pending slots
    > +			 * in each iteration, it's better to process all the slots received in
    > +			 * the first iteration. This ensures that by the time we're done, all
    > +			 * slots reflect the latest values.
    > +			 */
    > +			remote_slots = fetch_remote_slots(wrconn, slot_names);
    > +
    > +			/* Attempt to synchronize slots */
    > +			some_slot_updated = synchronize_slots(wrconn, remote_slots,
    > +												  &slot_persistence_pending);
    > +
    > +			/*
    > +			 * If slot_persistence_pending is true, extract slot names
    > +			 * for future iterations (only needed if we haven't done it yet)
    > +			 */
    > +			if (slot_names == NIL && slot_persistence_pending)
    > +			{
    > +				slot_names = extract_slot_names(remote_slots);
    > +
    > +				/* Update the failure structure so that it can be freed on error */
    > +				fparams.slot_names = slot_names;
    > +			}
    > ```
    >
    > I am thinking if that could be a problem. As you now extract_slot_names() only in the first iteration, if a slot is dropped, and a new slot comes with the same name, will the new slot be incorrectly synced?
    >
    
    The slot name alone is insufficient to distinguish between the old and new
    slots.  In this case, the new slot state will overwrite the old.  I see no
    harm in this behavior, but please confirm if this is the desired behavior.
    
    -- 
    Regards,
    Japin Li
    ChengDu WenWu Information Technology Co., Ltd.