Thread

  1. Re: Improve LWLock tranche name visibility across backends

    Alexander Lakhin <exclusion@gmail.com> — 2025-11-02T10:00:00Z

    Hello Nathan,
    
    04.09.2025 23:37, Nathan Bossart wrote:
    > On Thu, Sep 04, 2025 at 12:30:27PM -0500, Sami Imseih wrote:
    >> I liked removing the repalloc calls inside this routine and did not think
    >> it was worth optimizing. I am OK with reverting it back. Although v1
    >> is incorrect since it's still initializing
    >> NamedLWLockTrancheRequestArray to MAX_NAMED_TRANCHES
    > Committed with that fix.
    >
    >>> Furthermore, the
    >>> MAX_NAMED_TRANCHES check isn't actually needed because InitializeLWLocks()
    >>> will do the same check via its calls to LWLockNewTrancheId() for all the
    >>> named tranche requests.
    >> I thought about that one and decided to add the error message there, since
    >> requesting a tranche happens way before LWLockNewTrancheId is called
    >> during CreateLWLocks, so it was more about erroring out slightly earlier.
    >> But it may be ok to also just remove it.
    > We needed it before because the array could only ever hold
    > MAX_NAMED_TRANCHES requests.
    
    I've discovered (with SQLsmith) that the new possible error makes
    pg_prewarm/autoprewarm fail when the error triggered during it's
    initialization and then another instance tries to acquire apw_state->lock.
    
    Please try the following:
    psql -c "CREATE EXTENSION test_dsa; CREATE EXTENSION pg_prewarm;"
    
    psql -c "SELECT test_dsa_resowners() FROM generate_series(1, 256)" >/dev/null
    
    psql -c "SELECT pg_sleep(0.5); SELECT autoprewarm_dump_now()" &
    
    psql -c "SELECT pg_sleep(1); SELECT autoprewarm_dump_now()"
    
    with
    debug_parallel_query = 'on'
    min_dynamic_shared_memory = '1GB'
    in postgresql.conf
    
    it causes PANIC for me (on roughly 5 out or 10 runs) as below:
    CREATE EXTENSION
    CREATE EXTENSION
      pg_sleep
    ----------
    
    (1 row)
    
    ERROR:  maximum number of tranches already registered
    DETAIL:  No more than 256 tranches may be registered.
      pg_sleep
    ----------
    
    (1 row)
    
    server closed the connection unexpectedly
             This probably means the server terminated abnormally
             before or while processing the request.
    
    PANIC:  stuck spinlock detected at LWLockWaitListLock, lwlock.c:882
    ...
    #5  0x000057831e809d1d in errfinish (filename=0x57831ea53edd "s_lock.c", lineno=89, funcname=0x57831ea53ee8 <__func__.0> 
    "s_lock_stuck") at elog.c:609
    #6  0x000057831e5f2185 in s_lock_stuck (file=0x57831ea5184c "lwlock.c", line=882, func=0x57831ea51c30 <__func__.5> 
    "LWLockWaitListLock") at s_lock.c:89
    #7  0x000057831e5f2291 in perform_spin_delay (status=0x7ffdf9fabbe0) at s_lock.c:135
    #8  0x000057831e5e4015 in LWLockWaitListLock (lock=0x736a16ad6500) at lwlock.c:886
    #9  0x000057831e5e4475 in LWLockQueueSelf (lock=0x736a16ad6500, mode=LW_EXCLUSIVE) at lwlock.c:1055
    #10 0x000057831e5e4728 in LWLockAcquire (lock=0x736a16ad6500, mode=LW_EXCLUSIVE) at lwlock.c:1259
    #11 0x0000736a63dbcc82 in apw_dump_now (is_bgworker=false, dump_unlogged=true) at autoprewarm.c:676
    #12 0x0000736a63dbd5c9 in autoprewarm_dump_now (fcinfo=0x57835c770fe8) at autoprewarm.c:854
    ...
    
    Best regards,
    Alexander