Thread

  1. Re: [PATCH] Release replication slot on error in SQL-callable slot functions

    vignesh C <vignesh21@gmail.com> — 2026-05-21T06:49:42Z

    On Mon, 11 May 2026 at 08:31, Fujii Masao <masao.fujii@gmail.com> wrote:
    >
    > On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
    > <satyanarlapuram@gmail.com> wrote:
    > >
    > > Hi Hackers,
    > >
    > > SQL-callable replication slot functions acquire a slot (setting
    > > the process-global MyReplicationSlot) but can then ERROR before reaching
    > > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
    > > EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains
    > > set because there is no subtransaction-level cleanup hook for replication
    > > slots.
    > >
    > > Any subsequent slot operation in the same session then hits
    > > Assert(MyReplicationSlot == NULL) and crashes the backend on assert
    > > enabled builds. In release builds the stale MyReplicationSlot is silently overwritten,
    > > permanently orphaning the old slot as "active." The orphaned slot blocks any other
    > > session from acquiring it, vacuum and WAL deletion.
    > >
    > > Repro:
    > >
    > > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
    > >
    > > DO $$ BEGIN
    > >     PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
    > > EXCEPTION WHEN others THEN
    > >     RAISE NOTICE 'caught: %', SQLERRM;
    > > END $$;
    > >
    > > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL);
    > >
    > > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT:  SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
    > > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 1096805
    > >
    > >
    > > Attached a patch to address this by wrapping error-prone paths in PG_TRY/PG_CATCH blocks
    > > and call ReplicationSlotRelease().
    >
    > Thanks for the report and the patch!
    >
    > I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
    > a good direction for addressing the issue you reported.
    >
    >
    > +   PG_CATCH();
    > +   {
    > +       ReplicationSlotRelease();
    >
    > When create_logical_replication_slot() is called with temporary = true,
    > the created logical replication slot has RS_TEMPORARY persistency. Such a slot
    > is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot is
    > dropped via ReplicationSlotDropAcquired().
    >
    > So even with the v1 patch, a temporary logical replication slot can remain
    > unexpectedly if pg_create_logical_replication_slot() throws an error.
    > In this case, should create_logical_replication_slot() explicitly drop the slot
    > with ReplicationSlotDropAcquired(), or temporarily change the slot persistency
    > to RS_EPHEMERAL before calling ReplicationSlotRelease()?
    >
    >
    > Does a newly created logical replication slot created by
    > pg_copy_logical_replication_slot() have the same issue?
    
    Additionally pg_logical_slot_get_changes also has the same issue, it
    can be reproduced by the following:
    SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding');
    
    DO $$
    BEGIN
        -- This will ERROR if the slot_get changes fails for the slot.
        PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL,
    NULL, 'nonexistent-option', 'val');
    EXCEPTION WHEN others THEN
        RAISE NOTICE 'caught: %', SQLERRM;
    END $$;
    
    SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL, NULL);
    
    TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
    Line: 638, PID: 80308
    postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba)
    [0x642e7b2ebae1]
    postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e)
    [0x642e7b00d732]
    
    Regards,
    Vignesh