Thread

  1. Re: [PATCH] Release replication slot on error in SQL-callable slot functions

    Fujii Masao <masao.fujii@gmail.com> — 2026-05-11T03:01:33Z

    On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
    <satyanarlapuram@gmail.com> wrote:
    >
    > Hi Hackers,
    >
    > SQL-callable replication slot functions acquire a slot (setting
    > the process-global MyReplicationSlot) but can then ERROR before reaching
    > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
    > EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains
    > set because there is no subtransaction-level cleanup hook for replication
    > slots.
    >
    > Any subsequent slot operation in the same session then hits
    > Assert(MyReplicationSlot == NULL) and crashes the backend on assert
    > enabled builds. In release builds the stale MyReplicationSlot is silently overwritten,
    > permanently orphaning the old slot as "active." The orphaned slot blocks any other
    > session from acquiring it, vacuum and WAL deletion.
    >
    > Repro:
    >
    > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
    >
    > DO $$ BEGIN
    >     PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
    > EXCEPTION WHEN others THEN
    >     RAISE NOTICE 'caught: %', SQLERRM;
    > END $$;
    >
    > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL);
    >
    > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT:  SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
    > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 1096805
    >
    >
    > Attached a patch to address this by wrapping error-prone paths in PG_TRY/PG_CATCH blocks
    > and call ReplicationSlotRelease().
    
    Thanks for the report and the patch!
    
    I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
    a good direction for addressing the issue you reported.
    
    
    +   PG_CATCH();
    +   {
    +       ReplicationSlotRelease();
    
    When create_logical_replication_slot() is called with temporary = true,
    the created logical replication slot has RS_TEMPORARY persistency. Such a slot
    is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot is
    dropped via ReplicationSlotDropAcquired().
    
    So even with the v1 patch, a temporary logical replication slot can remain
    unexpectedly if pg_create_logical_replication_slot() throws an error.
    In this case, should create_logical_replication_slot() explicitly drop the slot
    with ReplicationSlotDropAcquired(), or temporarily change the slot persistency
    to RS_EPHEMERAL before calling ReplicationSlotRelease()?
    
    
    Does a newly created logical replication slot created by
    pg_copy_logical_replication_slot() have the same issue?
    
    
    + PG_CATCH();
    + {
    + ReplicationSlotRelease();
    
    Should ReplicationSlotRelease() be called only when MyReplicationSlot
    is not NULL?
    
    
        /* Acquire the slot so we "own" it */
        ReplicationSlotAcquire(NameStr(*slotname), true, true);
    
    -   /* A slot whose restart_lsn has never been reserved cannot be advanced */
    -   if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))
    -       ereport(ERROR,
    -               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    -                errmsg("replication slot \"%s\" cannot be advanced",
    -                       NameStr(*slotname)),
    -                errdetail("This slot has never previously reserved
    WAL, or it has been invalidated.")));
    +   PG_TRY();
    +   {
    +       /* A slot whose restart_lsn has never been reserved cannot be
    advanced */
    +       if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))
    
    Shouldn't ReplicationSlotAcquire() also be moved inside the PG_TRY() block?
    Because it can throw an error after setting MyReplicationSlot.
    
    Regards,
    
    -- 
    Fujii Masao