Thread

  1. Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

    Baji Shaik <baji.pgdev@gmail.com> — 2026-05-18T00:34:20Z

    Hi Alvaro, Sami,
    
    Thank you both for the feedback.
    
    Sami, thanks for identifying the use-after-free with on_proc_exit
    and providing the v2 patch with before_shmem_exit. That helped
    narrow down the right approach.
    
    On Sun, May 17, 2026 at 3:46 PM Alvaro Herrera <alvherre@kurilemu.de> wrote:
    
    > I think a better fix for this is to use PG_ENSURE_ERROR_CLEANUP().  That
    > way we avoid leaving callbacks in place, which would not be great if the
    > same backend does a lot of REPACKs: after a dozen or so, it dies with
    >
    > FATAL:  out of before_shmem_exit slots
    >
    >
    Alvaro, you're right that PG_ENSURE_ERROR_CLEANUP is the better
    approach here. Attached is v3 which uses it instead of
    before_shmem_exit or on_proc_exit.
    
    Summary of the issues with v1 and v2:
    
    - v1 (on_proc_exit): Crashes with assertions enabled because
      memory contexts are already destroyed by the time on_proc_exit
      callbacks run. The worker handle is clobbered with 0x7f
      (CLOBBER_FREED_MEMORY).
    
    - v2 (before_shmem_exit): Works correctly for a single REPACK,
      but leaks a callback slot on each successful REPACK
      (CONCURRENTLY). After ~15 REPACKs in the same session:
      FATAL: out of before_shmem_exit slots.
    
    v3 uses PG_ENSURE_ERROR_CLEANUP which:
    - Handles both ERROR and FATAL exits
    - Automatically cancels the callback on normal completion
      (no slot leak)
    - Runs before memory contexts are destroyed (no use-after-free)
    
    The existing PG_TRY/PG_FINALLY block is replaced with
    PG_ENSURE_ERROR_CLEANUP for the concurrent path, and a plain
    rebuild_relation() call for the non-concurrent path.
    
    Tested with --enable-cassert --enable-debug --enable-injection-points:
    - All 245 regression tests pass (including cluster)
    - All 8 injection point tests pass (including repack and repack_toast)
    - pg_terminate_backend cleans up worker and slot without crash
    - 20 REPACK (CONCURRENTLY) in same session completes without
      slot exhaustion
    
    I have not added a dedicated regression test for the
    pg_terminate_backend scenario yet, but I can write one using
    injection points if needed.
    
    Thanks,
    Baji Shaik