Thread

  1. RE: Newly created replication slot may be invalidated by checkpoint

    Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2025-12-11T07:09:21Z

    
    > -----Original Message-----
    > From: Amit Kapila <amit.kapila16@gmail.com>
    > Sent: Tuesday, December 9, 2025 7:33 PM
    > To: Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com>
    > Cc: Vitaly Davydov <v.davydov@postgrespro.ru>; pgsql-
    > hackers@lists.postgresql.org; suyu.cmj <mengjuan.cmj@alibaba-inc.com>;
    > tomas <tomas@vondra.me>; michael <michael@paquier.xyz>;
    > bharath.rupireddyforpostgres <bharath.rupireddyforpostgres@gmail.com>;
    > Alexander Korotkov <aekorotkov@gmail.com>; Masahiko Sawada
    > <sawada.mshk@gmail.com>
    > Subject: Re: Newly created replication slot may be invalidated by checkpoint
    > 
    > On Mon, Dec 8, 2025 at 3:54 PM Zhijie Hou (Fujitsu)
    > <houzj.fnst@fujitsu.com> wrote:
    > >
    > > On Monday, December 8, 2025 5:47 PM Amit Kapila
    > <amit.kapila16@gmail.com> wrote:
    > > >
    > > > On Mon, Dec 8, 2025 at 12:53 PM Masahiko Sawada
    > > > <sawada.mshk@gmail.com> wrote:
    > > > >
    > > > > On Fri, Dec 5, 2025 at 4:10 AM Amit Kapila <amit.kapila16@gmail.com>
    > > > wrote:
    > > > > >
    > > > > > On Thu, Dec 4, 2025 at 12:12 PM Zhijie Hou (Fujitsu)
    > > > > > <houzj.fnst@fujitsu.com> wrote:
    > > > > > >
    > > > > > > Here are the updated patches for HEAD and 18. I did not add tests
    > > > > > > since, after applying the patch and resolving the issue, the only
    > > > > > > observable behavior is that the checkpoint will wait for another
    > > > > > > backend to create a slot due to the lwlock lock, so it seems not
    > > > > > > worth to test solely lwlock wait event (I could not find similar tests).
    > > > > > >
    > > > > >
    > > > > > Fair enough. The patch looks mostly good to me, attached are minor
    > > > > > comment improvements atop the HEAD patch. I'll do some more
    > testing
    > > > > > before push.
    > > > > >
    > > > > > Sawada-san/Vitaly, do you have any opinion on patch or the direction
    > > > > > to fix? The idea is to get this fixed for HEAD and 18, then continue
    > > > > > discussion for other bank-branches and the remaining patches.
    > > > >
    > > > > +1
    > > > >
    > > >
    > > > Thanks, Pushed. I'll continue thinking on how to fix it in branches prior to
    > 18
    > > > and other problems reported in this thread.
    > >
    > > Thanks for pushing. I thought about whether it's possible to apply a similar
    > fix
    > > to back-branches and one approach could be to take
    > ReplicationSlotAllocationLock
    > > at two places. E.g., acquire an exclusive lock WAL reservation, and a shared
    > > lock during the minimum LSN calculation at checkpoints to serialize the
    > process.
    > >
    > 
    > + *
    > + * Additionally, acquiring the Allocation lock to serialize the minimum LSN
    > + * calculation with concurrent slot WAL reservation. This ensures that the
    > + * WAL position being reserved is either included in the miminum LSN or is
    > + * beyond or equal to the redo pointer of the current checkpoint (See
    > + * ReplicationSlotReserveWal for details).
    >   */
    > + LWLockAcquire(ReplicationSlotAllocationLock, LW_SHARED);
    >   slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
    > + LWLockRelease(ReplicationSlotAllocationLock);
    > 
    > Yeah, this will fix the reported issue but doesn't it look odd to take
    > an unrelated lock here? I mean it appears that if someone has to call
    > XLogGetReplicationSlotMinimumLSN(), they should acquire
    > ReplicationSlotAllocationLock in LW_SHARED mode? If we want to go in
    > this direction and don't have better ideas to fix then we should add
    > comments suggesting this is a special case and shouldn't be used as an
    > example for other places.
    
    I tried to add some comments in v10 patch.
    
    > 
    > The other idea to fix this problem is suggested by Alexander in his
    > email [1] which is to introduce a new ReplicationSlotReserveWALLock
    > for this purpose. I think introducing LWLock in back branches could be
    > questionable. Did you evaluate the pros and cons of using that
    > approach?
    
    I reviewed that approach, and I think the main distinction lies in whether to
    use a new LWLock to serialize the process or rely on an existing lock.
    Introducing a new LWLock in back branches would alter the size of
    MainLWLockArray and affect NUM_INDIVIDUAL_LWLOCKS/LWTRANCHE_FIRST_USER_DEFINED.
    Although this may not directly impact user applications since users typically
    use standard APIs like RequestNamedLWLockTranche and LWLockNewTrancheId to add
    private LWLocks, it still has a slight risk. Additionally, using an existing
    lock could keep code similarity with the HEAD, which can be helpful for future
    bug fixes and analysis.
    
    > Yet, another possibility is that we don't fix this in back branches
    > prior to 18 but not sure how frequently it can impact users. Suyu, can
    > you please tell how you found this problem in the first place? Is it
    > via code-review or did you hit this in the production or while doing
    > some related tests?
    > 
    > BTW, I have asked a question regarding commit 2090edc6f32f652a2c in
    > email [2]. Did you get a chance to look at that?
    
    Please refer to the next inline reply.
    
    > +       /*
    > +        * Recalculate the current minimum LSN to be used in the WAL segment
    > +        * cleanup.  Then, we must synchronize the replication slots again in
    > +        * order to make this LSN safe to use.
    > +        */
    > +       slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
    > +       CheckPointReplicationSlots(shutdown);
    > +
    >         /*
    >          * Some slots have been invalidated; recalculate the old-segment
    >          * horizon, starting again from RedoRecPtr.
    >          */
    >         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
    > -       KeepLogSeg(recptr, &_logSegNo);
    > +       KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
    > 
    > 
    > 
    > After invalidating the slots, we recalculate the slotsMinReqLSN with the
    > latest value of XLogGetReplicationSlotMinimumLSN(). Can't it generate a more
    > recent value of slot's restart_lsn which has not been flushed and we may end
    > up removing the corresponding WAL?
    
    Since CheckPointReplicationSlots() is immediately called after recalculating the
    slot's minimum LSN, it ensures that the dirty slot's restart_lsn is flushed to
    disk before any WAL removal takes place. So, I think only those WALs whose
    removal is based on the flushed restart_lsn value are eliminated.
    
    > 
    > [1] - https://www.postgresql.org/message-
    > id/CAPpHfduZY7_pRCrbLdsLty4zP5x2EDmwk4CYiofiyjdt1iK%2BzA%40mail.gm
    > ail.com
    > [2] - https://www.postgresql.org/message-
    > id/CAA4eK1%2BwrNSee6PKQ0%2BDtUu_W0GdvewskpAEK5EiX6r3E%2B2Sxw
    > %40mail.gmail.com
    
    Best Regards,
    Hou zj