Thread

  1. RE: Newly created replication slot may be invalidated by checkpoint

    Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2025-12-04T06:42:34Z

    On Thursday, December 4, 2025 1:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > 
    > On Thu, Dec 4, 2025 at 2:04 AM Masahiko Sawada
    > <sawada.mshk@gmail.com> wrote:
    > >
    > > On Tue, Dec 2, 2025 at 10:15 PM Zhijie Hou (Fujitsu)
    > > <houzj.fnst@fujitsu.com> wrote:
    > > >
    > > > I think the invalidation cannot occur when copying because:
    > > >
    > > > Currently, there are no CHECK_FOR_INTERRUPTS() calls between the
    > > > initial restart_lsn copy (first phase) and the latest restart_lsn copy (second phase).
    > > > As a result, even if a checkpoint attempts to invalidate a slot and
    > > > sends a SIGTERM to the backend, the backend will first update the
    > > > restart_lsn during the second phase before responding to the signal.
    > > > Consequently, during the next cycle of
    > > > InvalidatePossiblyObsoleteSlot(), the checkpoint will observe the updated 
    > > > restart_lsn and skip the invalidation.
    > > >
    > > > For logical slots, although invoking the output plugin startup
    > > > callback presents a slight chance of processing the signal (when
    > > > using third-party plugins), slot invalidation in this scenario
    > > > results in immediate slot dropping, because the slot is in RS_EPHEMERAL
    > state, thus preventing invalidation.
    > >
    > > Thank you for the analysis. I agree.
    > >
    > > > While theoretically, slot invalidation could occur if the code
    > > > changes in the future, addressing that possibility could be
    > > > considered an independent improvement task. What do you think ?
    > >
    > > Okay. I find that it also might make sense for HEAD to use
    > > RS_EPHEMERAL state for physical slots too to avoid being invalidated
    > > during creation, which probably can be discussed later. For back
    > > branches, the proposed idea of acquiring ReplicationSlotAllocationLock
    > > in an exclusive mode would be better. I think we might want to have a
    > > comment in CheckPointReplicationSlots() too that refers to
    > > ReplicationSlotReserveWal().
    > >
    > > Regarding whether we revert the original fix 2090edc6f32 and make it
    > > the same as we did in HEAD ca307d5cec90a4f, we need to change the size
    > > of ReplicationSlot struct. I'm concerned that it's really safe to
    > > change it because the data resides on the shared memory. For example,
    > > we typically iterate over all replication slots as follow:
    > >
    > > for (i = 0; i < max_replication_slots; i++) {
    > >     ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
    > >
    > > I'm concerned that the arithmetic for calculating the slot address is
    > > affected by the size of ReplicationSlot change.
    > >
    > 
    > Yes, this is a valid concern. I think we can go-ahead with fixing the 0001's-fix
    > in HEAD and 18. We can discuss separately the fix for back-branches prior to
    > 18.
    
    Here are the updated patches for HEAD and 18. I did not add tests since, after
    applying the patch and resolving the issue, the only observable behavior is that
    the checkpoint will wait for another backend to create a slot due to the lwlock
    lock, so it seems not worth to test solely lwlock wait event (I could not find similar
    tests).
    
    Best Regards,
    Hou zj