Thread

  1. Re: [PATCH] Release replication slot on error in SQL-callable slot functions

    SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> — 2026-05-21T14:38:53Z

    Hi
    
    On Wed, May 20, 2026 at 11:49 PM vignesh C <vignesh21@gmail.com> wrote:
    
    > On Mon, 11 May 2026 at 08:31, Fujii Masao <masao.fujii@gmail.com> wrote:
    > >
    > > On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
    > > <satyanarlapuram@gmail.com> wrote:
    > > >
    > > > Hi Hackers,
    > > >
    > > > SQL-callable replication slot functions acquire a slot (setting
    > > > the process-global MyReplicationSlot) but can then ERROR before
    > reaching
    > > > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
    > > > EXCEPTION block (which uses a subtransaction), MyReplicationSlot
    > remains
    > > > set because there is no subtransaction-level cleanup hook for
    > replication
    > > > slots.
    > > >
    > > > Any subsequent slot operation in the same session then hits
    > > > Assert(MyReplicationSlot == NULL) and crashes the backend on assert
    > > > enabled builds. In release builds the stale MyReplicationSlot is
    > silently overwritten,
    > > > permanently orphaning the old slot as "active." The orphaned slot
    > blocks any other
    > > > session from acquiring it, vacuum and WAL deletion.
    > > >
    > > > Repro:
    > > >
    > > > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
    > > >
    > > > DO $$ BEGIN
    > > >     PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
    > > > EXCEPTION WHEN others THEN
    > > >     RAISE NOTICE 'caught: %', SQLERRM;
    > > > END $$;
    > > >
    > > > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL,
    > NULL);
    > > >
    > > > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT:  SELECT
    > pg_create_logical_replication_slot('adv_test', 'test_decoding');
    > > > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
    > Line: 638, PID: 1096805
    > > >
    > > >
    > > > Attached a patch to address this by wrapping error-prone paths in
    > PG_TRY/PG_CATCH blocks
    > > > and call ReplicationSlotRelease().
    > >
    > > Thanks for the report and the patch!
    > >
    > > I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
    > > a good direction for addressing the issue you reported.
    > >
    > >
    > > +   PG_CATCH();
    > > +   {
    > > +       ReplicationSlotRelease();
    > >
    > > When create_logical_replication_slot() is called with temporary = true,
    > > the created logical replication slot has RS_TEMPORARY persistency. Such
    > a slot
    > > is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot
    > is
    > > dropped via ReplicationSlotDropAcquired().
    > >
    > > So even with the v1 patch, a temporary logical replication slot can
    > remain
    > > unexpectedly if pg_create_logical_replication_slot() throws an error.
    > > In this case, should create_logical_replication_slot() explicitly drop
    > the slot
    > > with ReplicationSlotDropAcquired(), or temporarily change the slot
    > persistency
    > > to RS_EPHEMERAL before calling ReplicationSlotRelease()?
    > >
    > >
    > > Does a newly created logical replication slot created by
    > > pg_copy_logical_replication_slot() have the same issue?
    >
    > Additionally pg_logical_slot_get_changes also has the same issue, it
    > can be reproduced by the following:
    > SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding');
    >
    > DO $$
    > BEGIN
    >     -- This will ERROR if the slot_get changes fails for the slot.
    >     PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL,
    > NULL, 'nonexistent-option', 'val');
    > EXCEPTION WHEN others THEN
    >     RAISE NOTICE 'caught: %', SQLERRM;
    > END $$;
    >
    > SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL,
    > NULL);
    >
    > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
    > Line: 638, PID: 80308
    > postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba)
    > [0x642e7b2ebae1]
    > postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e)
    > [0x642e7b00d732]
    
    
    Thank you for letting me know. Fixing these cases in the next update, will
    send it shortly.
    
    Thanks,
    Satya
    
    >
    >
    >