Thread
-
Fix race in ReplicationSlotRelease for ephemeral slots
Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2026-05-27T11:50:16Z
Hi, While testing the slot release logic, I noticed a bug in ReplicationSlotRelease() where it may access a replication slot array entry that has already been released by itself. The detail is: When releasing an ephemeral replication slot, ReplicationSlotRelease() first drops the slot via ReplicationSlotDropAcquired(). After this point, the slot's shared memory slot array entry can be immediately reused by another backend creating a new slot. However, ReplicationSlotRelease() continued executing common cleanup code that still dereferenced the old slot pointer and updated shared memory fields such as effective_xmin. If the slot array entry had already been reallocated, these writes could inadvertently affect a different, unrelated slot. I am attaching a patch that avoids touching slot shared-memory state after dropping an ephemeral slot. Keep the post-release shared-memory updates only for non-ephemeral slots, where the slot remains valid after release. To reproduce, we can use the following steps: 1. Attach gdb to the backend and set a breakpoint in ReplicationSlotRelease() right after ReplicationSlotDropAcquired() is called. 2. Create an ephemeral slot in the above backend with an invalid output plugin: SELECT pg_create_logical_replication_slot('test_slot_dropped', 'pgoutput2', false, false, true); 3. Once the breakpoint is hit, start another backend and create a new slot named 'test_slot_created'. 4. Release the breakpoint and allow the first backend to continue. At this point, you will see it updating the new slot 'test_slot_created' -> active_proc (and effective_xmin, if a snapshot is being exported) to invalid values. 5. Start a third backend and attempt to acquire the same slot 'test_slot_created' ? this should not be possible under normal circumstances, but the bug allows it. I haven't attached a test for this fix, as the change is straightforward and the likelihood of encountering this bug is low, so it may not be worth adding test cycles for it. However, if others feel differently, I'm OK to add one. Best Regards, Hou zj