Thread

  1. Re: Missing calls to UnlockBuffers() - unify error handling?

    Rahila Syed <rahilasyed90@gmail.com> — 2025-11-14T11:40:35Z

    Hi Andres,
    
    
    > Widening the perspective a bit, our current approach for such
    > error-handling /
    > shutdown functions seems ... not maintainable. In particular we have a
    > substantial number of top-level sigsetjmp() handlers that have slightly
    > different recovery codepaths.  When adding a new process type, how is one
    > supposed to even know what function one needs to call?
    >
    > PostgresMain() has this comment:
    >                 /*
    >                  * NOTE: if you are tempted to add more code in this
    > if-block,
    >                  * consider the high probability that it should be in
    >                  * AbortTransaction() instead.  The only stuff done
    > directly here
    >                  * should be stuff that is guaranteed to apply *only* for
    > outer-level
    >                  * error recovery, such as adjusting the FE/BE protocol
    > status.
    >                  */
    >
    > But a) that clearly hasn't worked, given how long the following block is,
    > and
    > b) can't really work that well, because we have plenty processes that don't
    > use transaction, therefore putting cleanup in AbortTransaction() doesn't go
    > that far.
    >
    
    I don't quite know how it should look like, but it seems pretty obvious that
    > it should be more unified than it is.  My gut feeling is that we should
    > have a
    > single error recovery function that has a flags argument guiding which
    > subsystems should be reset and which shouldn't.
    >
    >
    +1 for the idea. I am interested in writing a patch for the same if you
    would like.
    
    As you mentioned, these code blocks under the if
    (sigsetjmp(local_sigjmp_buf, 1) != 0)
    statement have a very similar set of function calls, depending on the
    process type—whether
    it's an auxiliary process, background process, or client backend.
    There are also similarities across these types; for instance, all of them
    register a ProcKill
    callback as an on_shmem_exit callback..
    
    Currently, we perform LWLockReleaseAll() at the before_shmem_exit stage,
    such as in the ShutdownAuxiliaryProcess() callback for auxiliary processes
    and
    ShutdownPostgres() for client backends.
    However, there are some inconsistencies:
    1.  The client backend does not call LWLockReleaseAll() until ProcKill(),
    if it is not
    in a transaction.
    2. The background processes and AutovacuumLauncher do not register a
    before_shmem_callback
    for calling LWLockReleaseAll() but may invoke it directly within the
    sigsetjmp() blocks.
    3. Some auxiliary processes also call LWLockReleaseAll() early in the
    sigsetjmp() block.
    
    
    > Seems we should just reorder the sequence in ProcKill() to first call
    > LWLockReleaseAll()
    
    
    It would be too late to call it in ProcKill, since dsm_backend_shutdown()
    would
    detach segments containing the LWLocks before this. Using a
    before_shmem_exit callback or
    handling it in the initial steps of sigsetjmp might be more suitable.
    
    Thank you,
    Rahila Syed