Thread
-
Re: [PATCH] Release replication slot on error in SQL-callable slot functions
Fujii Masao <masao.fujii@gmail.com> — 2026-05-11T03:01:33Z
On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote: > > Hi Hackers, > > SQL-callable replication slot functions acquire a slot (setting > the process-global MyReplicationSlot) but can then ERROR before reaching > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL > EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains > set because there is no subtransaction-level cleanup hook for replication > slots. > > Any subsequent slot operation in the same session then hits > Assert(MyReplicationSlot == NULL) and crashes the backend on assert > enabled builds. In release builds the stale MyReplicationSlot is silently overwritten, > permanently orphaning the old slot as "active." The orphaned slot blocks any other > session from acquiring it, vacuum and WAL deletion. > > Repro: > > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding'); > > DO $$ BEGIN > PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn); > EXCEPTION WHEN others THEN > RAISE NOTICE 'caught: %', SQLERRM; > END $$; > > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL); > > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding'); > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 1096805 > > > Attached a patch to address this by wrapping error-prone paths in PG_TRY/PG_CATCH blocks > and call ReplicationSlotRelease(). Thanks for the report and the patch! I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems a good direction for addressing the issue you reported. + PG_CATCH(); + { + ReplicationSlotRelease(); When create_logical_replication_slot() is called with temporary = true, the created logical replication slot has RS_TEMPORARY persistency. Such a slot is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot is dropped via ReplicationSlotDropAcquired(). So even with the v1 patch, a temporary logical replication slot can remain unexpectedly if pg_create_logical_replication_slot() throws an error. In this case, should create_logical_replication_slot() explicitly drop the slot with ReplicationSlotDropAcquired(), or temporarily change the slot persistency to RS_EPHEMERAL before calling ReplicationSlotRelease()? Does a newly created logical replication slot created by pg_copy_logical_replication_slot() have the same issue? + PG_CATCH(); + { + ReplicationSlotRelease(); Should ReplicationSlotRelease() be called only when MyReplicationSlot is not NULL? /* Acquire the slot so we "own" it */ ReplicationSlotAcquire(NameStr(*slotname), true, true); - /* A slot whose restart_lsn has never been reserved cannot be advanced */ - if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn)) - ereport(ERROR, - (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("replication slot \"%s\" cannot be advanced", - NameStr(*slotname)), - errdetail("This slot has never previously reserved WAL, or it has been invalidated."))); + PG_TRY(); + { + /* A slot whose restart_lsn has never been reserved cannot be advanced */ + if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn)) Shouldn't ReplicationSlotAcquire() also be moved inside the PG_TRY() block? Because it can throw an error after setting MyReplicationSlot. Regards, -- Fujii Masao