Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

Álvaro Herrera <alvherre@kurilemu.de>

View thread

From: Alvaro Herrera <alvherre@kurilemu.de>

To: Baji Shaik <baji.pgdev@gmail.com>

Cc: pgsql-hackers@lists.postgresql.org

Date: 2026-05-19T18:45:05Z

Lists: pgsql-hackers

Attachments

0001-Restructure-repack-worker-teardown.patch (text/x-diff)

On 2026-May-17, Baji Shaik wrote:

> v3 uses PG_ENSURE_ERROR_CLEANUP which:
> - Handles both ERROR and FATAL exits
> - Automatically cancels the callback on normal completion
>   (no slot leak)
> - Runs before memory contexts are destroyed (no use-after-free)

Yeah, looks good.  I have pushed it, with some comment wordsmithing and
other cosmetic changes.

While looking at it, I realized that I didn't like the way
stop_repack_decoding_worker() works, mainly because if there's no
handle, we leak everything else -- and the way we initialize things
means we leak the shared memory segment.  This is maybe a rare case and
just a small memory leak, but it seems better to do it nicely.  So
here's a followup patch that reworks that code.  This also forced me to
understand more clearly what is going on, so I rewrote the comments.

> - 20 REPACK (CONCURRENTLY) in same session completes without
>   slot exhaustion

FWIW I tested this by doing "repack (concurrently) foo \watch 0.1" and
letting it run for some time.  I happened to notice that if I have two
psqls running, one with the above and the second with the equivalent for
table bar, when they run together, each runs more quickly than when only
one of them is running.  I don't know what causes this; I suspect/assume
it's because the WAL messages for initial historic snapshot creation
from one gets the other running.

> I have not added a dedicated regression test for the
> pg_terminate_backend scenario yet, but I can write one using
> injection points if needed.

I don't feel a need for that.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/