Re: Newly created replication slot may be invalidated by checkpoint
Masahiko Sawada <sawada.mshk@gmail.com>
From: Masahiko Sawada <sawada.mshk@gmail.com>
To: Amit Kapila <amit.kapila16@gmail.com>
Cc: "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>,
Vitaly Davydov <v.davydov@postgrespro.ru>, "pgsql-hackers@lists.postgresql.org" <pgsql-hackers@lists.postgresql.org>, "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>, tomas <tomas@vondra.me>, michael <michael@paquier.xyz>, "bharath.rupireddyforpostgres" <bharath.rupireddyforpostgres@gmail.com>, Alexander Korotkov <aekorotkov@gmail.com>
Date: 2025-12-30T00:01:18Z
Lists: pgsql-hackers
On Sun, Dec 14, 2025 at 8:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 11, 2025 at 12:39 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > >
> > > The other idea to fix this problem is suggested by Alexander in his
> > > email [1] which is to introduce a new ReplicationSlotReserveWALLock
> > > for this purpose. I think introducing LWLock in back branches could be
> > > questionable. Did you evaluate the pros and cons of using that
> > > approach?
> >
> > I reviewed that approach, and I think the main distinction lies in whether to
> > use a new LWLock to serialize the process or rely on an existing lock.
> > Introducing a new LWLock in back branches would alter the size of
> > MainLWLockArray and affect NUM_INDIVIDUAL_LWLOCKS/LWTRANCHE_FIRST_USER_DEFINED.
> > Although this may not directly impact user applications since users typically
> > use standard APIs like RequestNamedLWLockTranche and LWLockNewTrancheId to add
> > private LWLocks, it still has a slight risk. Additionally, using an existing
> > lock could keep code similarity with the HEAD, which can be helpful for future
> > bug fixes and analysis.
> >
>
> Fair enough. I'll wait for Sawada-san/Vitaly to see what their opinion
> on this matter is.
While it's hacky that the proposed approach takes
ReplicationSlotAllocationLock before
XLogGetReplicationSlotMinimumLSN() during checkpoint, I find that it's
better than introducing a new LWLock in minor versions which could
lead unexpected compatibility issues.
Regarding the v10 patch, do we need to take
ReplicationSlotAllocationLock also at the following place?
/*
* Recalculate the current minimum LSN to be used in the WAL segment
* cleanup. Then, we must synchronize the replication slots again in
* order to make this LSN safe to use.
*/
slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
CheckPointReplicationSlots(shutdown);
I think we need to add some comments regardless of taking the lwlock.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com