Thread
-
Re: [PATCH] Release replication slot on error in SQL-callable slot functions
vignesh C <vignesh21@gmail.com> — 2026-05-21T06:49:42Z
On Mon, 11 May 2026 at 08:31, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM > <satyanarlapuram@gmail.com> wrote: > > > > Hi Hackers, > > > > SQL-callable replication slot functions acquire a slot (setting > > the process-global MyReplicationSlot) but can then ERROR before reaching > > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL > > EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains > > set because there is no subtransaction-level cleanup hook for replication > > slots. > > > > Any subsequent slot operation in the same session then hits > > Assert(MyReplicationSlot == NULL) and crashes the backend on assert > > enabled builds. In release builds the stale MyReplicationSlot is silently overwritten, > > permanently orphaning the old slot as "active." The orphaned slot blocks any other > > session from acquiring it, vacuum and WAL deletion. > > > > Repro: > > > > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding'); > > > > DO $$ BEGIN > > PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn); > > EXCEPTION WHEN others THEN > > RAISE NOTICE 'caught: %', SQLERRM; > > END $$; > > > > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL); > > > > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding'); > > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 1096805 > > > > > > Attached a patch to address this by wrapping error-prone paths in PG_TRY/PG_CATCH blocks > > and call ReplicationSlotRelease(). > > Thanks for the report and the patch! > > I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems > a good direction for addressing the issue you reported. > > > + PG_CATCH(); > + { > + ReplicationSlotRelease(); > > When create_logical_replication_slot() is called with temporary = true, > the created logical replication slot has RS_TEMPORARY persistency. Such a slot > is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot is > dropped via ReplicationSlotDropAcquired(). > > So even with the v1 patch, a temporary logical replication slot can remain > unexpectedly if pg_create_logical_replication_slot() throws an error. > In this case, should create_logical_replication_slot() explicitly drop the slot > with ReplicationSlotDropAcquired(), or temporarily change the slot persistency > to RS_EPHEMERAL before calling ReplicationSlotRelease()? > > > Does a newly created logical replication slot created by > pg_copy_logical_replication_slot() have the same issue? Additionally pg_logical_slot_get_changes also has the same issue, it can be reproduced by the following: SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding'); DO $$ BEGIN -- This will ERROR if the slot_get changes fails for the slot. PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL, NULL, 'nonexistent-option', 'val'); EXCEPTION WHEN others THEN RAISE NOTICE 'caught: %', SQLERRM; END $$; SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL, NULL); TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 80308 postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba) [0x642e7b2ebae1] postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e) [0x642e7b00d732] Regards, Vignesh