Thread
-
Re: Improve pg_sync_replication_slots() to wait for primary to advance
Japin Li <japinli@hotmail.com> — 2025-10-31T05:01:51Z
On Thu, 30 Oct 2025 at 19:15, Chao Li <li.evan.chao@gmail.com> wrote: > Hi Ajin, > > I have reviewed v20 and got a few comments: > >> On Oct 30, 2025, at 18:18, Ajin Cherian <itsajin@gmail.com> wrote: >> >> <v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch> > > 1 - slotsync.c > ``` > + if (slot_names) > + list_free_deep(slot_names); > > /* Cleanup the synced temporary slots */ > ReplicationSlotCleanup(true); > @@ -1762,5 +2026,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn) > /* We are done with sync, so reset sync flag */ > reset_syncing_flag(); > } > - PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn)); > + PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams)); > ``` > > I am afraid there is a risk of double memory free. Slot_names has been assigned to fparams.slot_names within the for loop, and it’s freed after the loop. If something gets wrong and slotsync_failure_callback() is called, the function will free fparams.slot_names again. > Agreed. Maybe we should set the fparams.slot_names to NIL immediately after freeing the memory. > 2 - slotsync.c > ``` > + /* > + * Fetch remote slot info for the given slot_names. If slot_names is NIL, > + * fetch all failover-enabled slots. Note that we reuse slot_names from > + * the first iteration; re-fetching all failover slots each time could > + * cause an endless loop. Instead of reprocessing only the pending slots > + * in each iteration, it's better to process all the slots received in > + * the first iteration. This ensures that by the time we're done, all > + * slots reflect the latest values. > + */ > + remote_slots = fetch_remote_slots(wrconn, slot_names); > + > + /* Attempt to synchronize slots */ > + some_slot_updated = synchronize_slots(wrconn, remote_slots, > + &slot_persistence_pending); > + > + /* > + * If slot_persistence_pending is true, extract slot names > + * for future iterations (only needed if we haven't done it yet) > + */ > + if (slot_names == NIL && slot_persistence_pending) > + { > + slot_names = extract_slot_names(remote_slots); > + > + /* Update the failure structure so that it can be freed on error */ > + fparams.slot_names = slot_names; > + } > ``` > > I am thinking if that could be a problem. As you now extract_slot_names() only in the first iteration, if a slot is dropped, and a new slot comes with the same name, will the new slot be incorrectly synced? > The slot name alone is insufficient to distinguish between the old and new slots. In this case, the new slot state will overwrite the old. I see no harm in this behavior, but please confirm if this is the desired behavior. -- Regards, Japin Li ChengDu WenWu Information Technology Co., Ltd.