Thread

  1. Re: Newly created replication slot may be invalidated by checkpoint

    Masahiko Sawada <sawada.mshk@gmail.com> — 2025-12-30T00:01:18Z

    On Sun, Dec 14, 2025 at 8:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > On Thu, Dec 11, 2025 at 12:39 PM Zhijie Hou (Fujitsu)
    > <houzj.fnst@fujitsu.com> wrote:
    > >
    > > >
    > > > The other idea to fix this problem is suggested by Alexander in his
    > > > email [1] which is to introduce a new ReplicationSlotReserveWALLock
    > > > for this purpose. I think introducing LWLock in back branches could be
    > > > questionable. Did you evaluate the pros and cons of using that
    > > > approach?
    > >
    > > I reviewed that approach, and I think the main distinction lies in whether to
    > > use a new LWLock to serialize the process or rely on an existing lock.
    > > Introducing a new LWLock in back branches would alter the size of
    > > MainLWLockArray and affect NUM_INDIVIDUAL_LWLOCKS/LWTRANCHE_FIRST_USER_DEFINED.
    > > Although this may not directly impact user applications since users typically
    > > use standard APIs like RequestNamedLWLockTranche and LWLockNewTrancheId to add
    > > private LWLocks, it still has a slight risk. Additionally, using an existing
    > > lock could keep code similarity with the HEAD, which can be helpful for future
    > > bug fixes and analysis.
    > >
    >
    > Fair enough. I'll wait for Sawada-san/Vitaly to see what their opinion
    > on this matter is.
    
    While it's hacky that the proposed approach takes
    ReplicationSlotAllocationLock before
    XLogGetReplicationSlotMinimumLSN() during checkpoint, I find that it's
    better than introducing a new LWLock in minor versions which could
    lead unexpected compatibility issues.
    
    Regarding the v10 patch, do we need to take
    ReplicationSlotAllocationLock also at the following place?
    
            /*
             * Recalculate the current minimum LSN to be used in the WAL segment
             * cleanup.  Then, we must synchronize the replication slots again in
             * order to make this LSN safe to use.
             */
            slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
            CheckPointReplicationSlots(shutdown);
    
    I think we need to add some comments regardless of taking the lwlock.
    
    Regards,
    
    -- 
    Masahiko Sawada
    Amazon Web Services: https://aws.amazon.com