Thread

  1. Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

    Naga Appani <nagnrik@gmail.com> — 2025-06-11T01:22:36Z

    On Tue, Jun 10, 2025 at 11:40 AM Andrew Johnson <andrewj@metronome.com> wrote:
    >
    > Hi Naga,
    >
    > Thank you for the thoughtful feedback and for driving attention to
    > this issue. I appreciate you taking the time to review my patch.
    >
    > You raise some good points about the trade-offs between a lightweight
    > function and the pgstat infrastructure. I actually think both
    > approaches have merit for different use cases, and they could
    > potentially coexist to serve the community better.
    >
    > > I shared a patch [0] that adds a SQL-callable function exposing the same counters via ReadMultiXactCounts() without complexity...introducing new statistics infrastructure may be more than what's needed unless there's an additional use case I'm overlooking...A lightweight function seems better aligned with the nature of these metrics and the operational use cases they serve, particularly for historical/ongoing diagnostics and periodic monitoring.
    >
    > I reviewed your patch in depth and I believe the pgstat approach I
    > took offers some advantages for continuous monitoring scenarios:
    >
    > 1. Performance under monitoring load: Many production environments,
    > including Metronome's, will poll these statistics frequently for
    > alerting. Using a direct call to pg_get_multixact_count() ->
    > ReadMultiXactCounts() acquires LWLocks, which could create significant
    > contention when multiple monitoring systems are polling frequently. In
    > high-throughput environments, this could become a bottleneck. The
    > pgstat view reads from shared memory snapshots without additional lock
    > acquisition, making it essentially free since we only update the
    > pgstat structure while we have the lock in the first place.
    >
    > 2. Consistency with existing patterns: PostgreSQL currently uses the
    > pgstat infrastructure for similar global, clusterwide metrics like
    > pg_stat_wal, pg_stat_wal_receiver, pg_stat_archiver, pg_stat_bgwriter,
    > and pg_stat_checkpointer. The multixact member count fits this same
    > pattern of cluster-wide resource monitoring.
    >
    > 3. Automatic updates: The stats update during natural multixact
    > operations (allocation, freeze threshold checks), providing current
    > data without requiring explicit polling of the underlying counters.
    >
    > Your function approach has clear benefits for ad-hoc diagnostics and
    > simpler operational queries where call frequency is low. I also note
    > that your patch tracks both multixacts and members, which provides
    > valuable additional context.
    >
    > I've also included isolation tests that verify the view accurately
    > reflects multixact member allocation, which helps ensure correctness
    > of the monitoring data.
    >
    > Given our production experience with multixact membership exhaustion
    > at Metronome, both approaches would solve the core observability
    > problem.
    >
    > I'm happy to keep discussing what the best approach for the community
    > is. It's great that more light is being shed on this particular issue.
    >
    > [0] https://www.postgresql.org/message-id/CA%2BQeY%2BDTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg%40mail.gmail.com
    >
    > --
    > Respectfully,
    >
    > Andrew Johnson
    > Software Engineer
    > Metronome, Inc.
    
    Hi Andrew,
    
    Thanks again for the thoughtful discussion and reviewing my patch.
    
    I would like to directly address the concern around LWLock contention
    and potential performance overhead from pg_get_multixact_count().
    
    To evaluate this, I ran repeated invocations of both
    pg_get_multixact_count() and pg_stat_multixact under sustained system
    load. At the time of testing, the system showed a significantly high
    load average with notable LWLock contention:
    
    ----------------------------------------------------------------------------------
    $ uptime
     00:23:30 up  2:08,  4 users,  load average: 103.18, 117.93, 116.77
    ----------------------------------------------------------------------------------
    
    ----------------------------------------------------------------------------------
    postgres=# SELECT
        a.wait_event_type,
        COUNT(*) AS count
    FROM pg_stat_activity a
    WHERE a.wait_event_type IS NOT NULL
    GROUP BY a.wait_event_type
    ORDER BY count DESC;
     wait_event_type | count
    -----------------+-------
     LWLock          |  1978
     Client          |  1014
     Timeout         |    21
     Activity        |     6
    (4 rows)
    ----------------------------------------------------------------------------------
    
    To simulate realistic monitoring intervals — typically in the 100ms to
    1s range — I used pgbench with -R 10 (10 queries per second) over a
    60-second duration for both the function and the view. Here are the
    full results:
    
    ----------------------------------------------------------------------------------
    $ pgbench -n -T 60 -R 10 -f <(echo "SELECT * FROM
    pg_get_multixact_count();") -h 127.0.0.1 -p 5593 -U postgres postgres
    pgbench (18beta1)
    transaction type: /dev/fd/63
    scaling factor: 1
    query mode: simple
    number of clients: 1
    number of threads: 1
    maximum number of tries: 1
    duration: 60 s
    number of transactions actually processed: 579
    number of failed transactions: 0 (0.000%)
    latency average = 0.892 ms
    latency stddev = 1.667 ms
    rate limit schedule lag: avg 0.160 (max 8.789) ms
    initial connection time = 7.106 ms
    tps = 9.655158 (without initial connection time)
    ----------------------------------------------------------------------------------
    
    ----------------------------------------------------------------------------------
    $ pgbench -n -T 60 -R 10 -f <(echo "SELECT * FROM pg_stat_multixact;")
    -h 127.0.0.1 -p 5593 -U postgres postgres
    pgbench (18beta1)
    transaction type: /dev/fd/63
    scaling factor: 1
    query mode: simple
    number of clients: 1
    number of threads: 1
    maximum number of tries: 1
    duration: 60 s
    number of transactions actually processed: 589
    number of failed transactions: 0 (0.000%)
    latency average = 0.378 ms
    latency stddev = 0.273 ms
    rate limit schedule lag: avg 0.137 (max 4.347) ms
    initial connection time = 4.516 ms
    tps = 9.846161 (without initial connection time)
    ----------------------------------------------------------------------------------
    
    Even under heavy LWLock contention, both approaches maintained stable
    performance. pg_get_multixact_count() executed comfortably under 1 ms
    on average, and schedule lag remained low, demonstrating that it can
    be safely used in periodic monitoring setups.
    
    I also compared the live counter from the function to the stats
    snapshot from the view:
    
    ----------------------------------------------------------------------------------
    postgres=# SELECT
        'from_pg_get_multixact_count' AS source,
        members
    FROM
        pg_get_multixact_count ()
    UNION ALL
    SELECT
        'from_pg_stat_multixact' AS source,
        members
    FROM
        pg_stat_multixact;
               source            |  members
    -----------------------------+-----------
     from_pg_get_multixact_count | 839438187
     from_pg_stat_multixact      | 839438011
    (2 rows)
    ----------------------------------------------------------------------------------
    
    While the values are quite close, I think it’s worth highlighting that
    pg_get_multixact_count() returns real-time state by calling
    ReadMultiXactCounts() at query time, whereas pg_stat_multixact reports
    values from the statistics collector’s last sampling cycle. Although
    the collector also calls ReadMultiXactCounts() internally, its updates
    are asynchronous and may lag — either due to the
    stats_fetch_consistency setting [1], or in environments with high load
    or memory pressure where stats updates may be delayed.
    
    This distinction was part of the motivation behind my earlier proposal
    [0], which introduced a lightweight SQL-callable function using
    ReadMultiXactCounts(). Since these counters are global, not
    aggregatable per backend, and don’t reset meaningfully, it seemed
    reasonable to expose them without adding new statistics
    infrastructure.
    
    I understand the appeal of following the pg_stat_* view pattern for
    consistency and passive observability, and I think both approaches
    could certainly coexist. The view based design offers a familiar
    interface for long-term monitoring, while the function can complement
    it in scenarios where timely visibility is helpful — such as
    monitoring multixact activity more closely during periods of elevated
    usage or investigating potential pressure before it escalates.
    
    While the function and view can coexist, I believe the function
    already addresses the majority of practical needs efficiently, without
    introducing noticeable contention — even on heavily loaded systems —
    and without adding the complexity of statistics infrastructure.
    
    I would be happy to continue the discussion or help refine the direction.
    
    [0] https://www.postgresql.org/message-id/CA%2BQeY%2BDTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg%40mail.gmail.com
    [1] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS
    
    Best regards,
    Naga