Thread

  1. Fix race in ReplicationSlotRelease for ephemeral slots

    Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2026-05-27T11:50:16Z

    Hi,
    
    While testing the slot release logic, I noticed a bug in
    ReplicationSlotRelease() where it may access a replication slot array entry that
    has already been released by itself.
    
    The detail is: When releasing an ephemeral replication slot,
    ReplicationSlotRelease() first drops the slot via ReplicationSlotDropAcquired().
    After this point, the slot's shared memory slot array entry can be immediately
    reused by another backend creating a new slot.
    
    However, ReplicationSlotRelease() continued executing common cleanup code that
    still dereferenced the old slot pointer and updated shared memory fields such as
    effective_xmin. If the slot array entry had already been reallocated, these
    writes could inadvertently affect a different, unrelated slot.
    
    I am attaching a patch that avoids touching slot shared-memory state after
    dropping an ephemeral slot. Keep the post-release shared-memory updates only for
    non-ephemeral slots, where the slot remains valid after release.
    
    To reproduce, we can use the following steps:
    
    1. Attach gdb to the backend and set a breakpoint in ReplicationSlotRelease()
       right after ReplicationSlotDropAcquired() is called.
    2. Create an ephemeral slot in the above backend with an invalid output plugin:
       SELECT pg_create_logical_replication_slot('test_slot_dropped', 'pgoutput2', false, false, true);
    3. Once the breakpoint is hit, start another backend and create a new slot
       named 'test_slot_created'.
    4. Release the breakpoint and allow the first backend to continue. At this
       point, you will see it updating the new slot 'test_slot_created' -> active_proc
       (and effective_xmin, if a snapshot is being exported) to invalid values.
    5. Start a third backend and attempt to acquire the same slot
       'test_slot_created' ? this should not be possible under normal circumstances,
       but the bug allows it.
    
    I haven't attached a test for this fix, as the change is straightforward and the
    likelihood of encountering this bug is low, so it may not be worth adding test
    cycles for it. However, if others feel differently, I'm OK to add one.
    
    Best Regards,
    Hou zj