Thread

  1. Re: Segmentation fault on proc exit after dshash_find_or_insert

    amit <amitlangote09@gmail.com> — 2025-12-04T16:03:44Z

    Hi,
    
    On Fri, Dec 5, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote:
    > On 2025-12-04 11:06:20 +0900, Amit Langote wrote:
    > > On Thu, Dec 4, 2025 at 12:33 AM Andres Freund <andres@anarazel.de> wrote:
    > > > I don't agree. You *cannot* rely on lwlocks working after an error before you
    > > > have called LWLockReleaseAll(). I.e. currently *any* use of lwlocks in
    > > > before_shmem_exit is unsafe. The only reason we haven't really noticed that is
    > > > that most of the top-level error handlers (i.e. sigsetjmp()s) do an
    > > > AbortCurrentTransaction(), which does an LWLockReleaseAll() if in a tx, and
    > > > most lwlock acquisitions happen within a transaction. But if you ever do stuff
    > > > outside of a transaction, the AbortCurrentTransaction() won't do
    > > > LWLockReleaseAll(), and you're in trouble, as the case here.
    > > >
    > > > IOW, I think we need to do LWLockReleaseAll() at the top of proc_exit(), at
    > > > least in case of FATAL errors.
    > >
    > > Oh, so not at the top of not shmem_exit() as Rahila has proposed?
    >
    > Oh, ontop of shmem_exit() seems fine, what I was trying to express was that it
    > should happen unconditionally at the start of exit processing, not just at the
    > tail.
    
    Ah, ok.  I was talking about this with Rahila today and she pointed
    out to me that whether we add it to the top of proc_exit() or
    shmem_exit() doesn't really make any material difference to the fact
    that it will get done at the start of exit processing as you say, at
    least today.  So I think we can keep it like Rahila originally
    proposed.
    
    > > > We probably should add a note to LWLockReleaseAll() indicating that we rely on
    > > > LWLockReleaseAll() working even if CreateLWLocks()/InitializeLWLocks() etc
    > > > hasn't yet been called...
    > >
    > > Makes sense.  AFAICS, num_held_lwlocks would be 0 in that case, so
    > > LWLockReleaseAll() would be a no-op.
    >
    > Right. I just meant we should add a comment noting that we rely on that
    > fact...
    
    Ok, got it.  Maybe like this:
    
    @@ -1940,6 +1940,10 @@ LWLockReleaseClearVar(LWLock *lock,
    pg_atomic_uint64 *valptr, uint64 val)
      * unchanged by this operation.  This is necessary since InterruptHoldoffCount
      * has been set to an appropriate level earlier in error recovery. We could
      * decrement it below zero if we allow it to drop for each released lock!
    + *
    + * Note that this function must be safe to call even if the LWLock subsystem
    + * hasn't been initialized (e.g., during early startup error recovery).
    + * In that case, num_held_lwlocks will be 0, and we'll do nothing.
      */
     void
     LWLockReleaseAll(void)
    
    -- 
    Thanks, Amit Langote