Thread

  1. [PATCH] pg_stat_lock: add blocker mode dimension

    Tatsuya Kawata <kawatatatsuya0913@gmail.com> — 2026-05-31T09:59:41Z

    Hi,
    
    I have been experimenting with pg_stat_lock and would like to see
    *what kind of operation* is causing waits, not just that waits
    happened.  This patch adds a "mode" dimension so the view can
    distinguish, e.g., waits caused by VACUUM (ShareUpdateExclusiveLock)
    from waits caused by DDL (AccessExclusiveLock), without parsing
    log_lock_waits output.
    
    The holder mode is already computed and emitted by log_lock_waits
    via GetLockHoldersAndWaiters(), so aggregating it into pg_stat_lock
    seems like a natural fit at low cost.
    
    ## Proposal
    
    Add a mode column to pg_stat_lock:
    
      locktype | mode                     | waits | wait_time |
    fastpath_exceeded
    
    ---------+--------------------------+-------+-----------+------------------
      relation | ShareUpdateExclusiveLock |   312 |     89200 |                0
      relation | AccessExclusiveLock      |    47 |    238400 |                0
    
    For waits / wait_time, the mode records the *blocking* mode,
    captured under the lock partition LWLock when the requester joins
    the wait queue:
    
      1. Among modes that conflict with the requester and are currently
         held (lock->granted[m] > 0), pick the strongest one.
      2. If no held mode conflicts (queue-priority-only wait), pick the
         strongest mode in lock->waitMask that conflicts.
    
    Rule (1) takes precedence so the recorded blocker reflects an actual
    holder when one exists, falling back to a queued waiter only for
    pure queue-priority waits.
    
    For fastpath_exceeded, the mode records the *requested* mode (slot
    exhaustion has no blocker); see open question 3 below.
    
    ## Use cases
    
    Three operational questions that are awkward today and trivial
    with this column:
    
    1. "Is VACUUM impacting user workload?"
       -> sum(wait_time) where mode = 'ShareUpdateExclusiveLock'.
    
    2. "Did the last DDL migration cause user-visible waits?"
       -> diff of waits/wait_time on mode = 'AccessExclusiveLock'
          across the deployment window.
    
    3. "Are SELECTs being blocked indirectly via queue priority?"
       -> AccessShare requesters appearing with a stronger blocker mode
          indicate they were queued behind a stronger waiter.
    
    ## Cost
    
      - Shared memory: +~2.3 kB per cluster (one PgStatShared_Lock
        expanded from [locktype] to [locktype][mode]).
      - No new instrumentation in the fast path.  The blocker-mode
        snapshot loop runs only when the request would otherwise wait.
    
    ## Alternatives considered
    
      A) requester_mode only.  Same storage, simpler implementation,
         but the information is largely inferable from locktype plus
         context.  No operational question seemed to need it.
    
      B) Both requester_mode and blocker_mode.  Most informative and
         could be added, but the view output becomes much larger.
         Users wanting that detail probably reach for pg_wait_sampling
         or log_lock_waits already.
    
    This patch proposes "blocker mode only" as a middle ground.
    
    ## Open questions
    
    1. Multi-blocker representation.
       When several conflicting modes are held simultaneously, the
       proposal picks the strongest -- the one whose release the
       waiter needs.  Other conventions worth considering?
    
    2. Per-row attribution for chained waits.
       Each wait event produces one row increment based on the
       proximate blocker.  When waits are chained, multiple rows are
       incremented for what is effectively the same contention:
    
         - TX1 holds AccessShareLock (long SELECT).
         - TX2 requests AccessExclusiveLock, queues behind TX1.
         - TX3 requests AccessShareLock, queues behind TX2.
    
       Result:
         locktype | mode                | waits | wait_time
         relation | AccessShareLock     | 1     | ~90s   (TX2)
         relation | AccessExclusiveLock | 1     | ~89s   (TX3)
    
       Both rows are individually accurate snapshots of "what each
       waiter was queued behind", but a naive read of the second row
       misses that TX1's long SELECT is the underlying cause.  Note
       that sum(wait_time) being per-waiter (not wall-clock contention
       time) is already true of the current pg_stat_lock; this proposal
       just makes it more visible.
    
       Is documenting this contract sufficient, or should the patch
       fold chained waits into a single attribution?  I lean toward
       documenting.
    
    3. Mode column semantics across counters.
       The mode column has two interpretations depending on which
       counter is non-zero:
    
         - waits / wait_time: the *blocker* mode.
         - fastpath_exceeded: the *requested* mode (slot exhaustion
           has no blocker).
    
       Both are useful for diagnosis (the latter shows which workload
       types exhaust per-backend slots).  Carrying two semantics on
       one column is admittedly awkward.  Alternatives:
    
         a) Document the dual semantic, keep one column.
         b) Move fastpath_exceeded to a separate view.
         c) NULL out mode for fastpath_exceeded rows, losing per-mode
            breakdown.
    
       I lean toward (a); the column name "mode" (rather than
       "blocker_mode") was chosen with this in mind.
    
    Draft patch v1 attached. Documentation updates are intentionally omitted
    from v1.
    
    Regards,
    Tatsuya Kawata