Thread

  1. RE: Newly created replication slot may be invalidated by checkpoint

    Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2025-12-08T10:24:46Z

    On Monday, December 8, 2025 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > 
    > On Mon, Dec 8, 2025 at 12:53 PM Masahiko Sawada
    > <sawada.mshk@gmail.com> wrote:
    > >
    > > On Fri, Dec 5, 2025 at 4:10 AM Amit Kapila <amit.kapila16@gmail.com>
    > wrote:
    > > >
    > > > On Thu, Dec 4, 2025 at 12:12 PM Zhijie Hou (Fujitsu)
    > > > <houzj.fnst@fujitsu.com> wrote:
    > > > >
    > > > > Here are the updated patches for HEAD and 18. I did not add tests
    > > > > since, after applying the patch and resolving the issue, the only
    > > > > observable behavior is that the checkpoint will wait for another
    > > > > backend to create a slot due to the lwlock lock, so it seems not
    > > > > worth to test solely lwlock wait event (I could not find similar tests).
    > > > >
    > > >
    > > > Fair enough. The patch looks mostly good to me, attached are minor
    > > > comment improvements atop the HEAD patch. I'll do some more testing
    > > > before push.
    > > >
    > > > Sawada-san/Vitaly, do you have any opinion on patch or the direction
    > > > to fix? The idea is to get this fixed for HEAD and 18, then continue
    > > > discussion for other bank-branches and the remaining patches.
    > >
    > > +1
    > >
    > 
    > Thanks, Pushed. I'll continue thinking on how to fix it in branches prior to 18
    > and other problems reported in this thread.
    
    Thanks for pushing. I thought about whether it's possible to apply a similar fix
    to back-branches and one approach could be to take ReplicationSlotAllocationLock
    at two places. E.g., acquire an exclusive lock WAL reservation, and a shared
    lock during the minimum LSN calculation at checkpoints to serialize the process.
    
    The logic is similar to HEAD: it ensures that, if WAL reservation
    occurs first, the checkpoint waits until restart_lsn is updated before
    calculating the minimum LSN. If the checkpoint runs first, subsequent WAL
    reservations pick a position at or after the latest checkpoint's redo pointer.
    
    Here is the patch based on PG17 for reference.
    
    Best Regards,
    Hou zj