Thread

  1. Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

    Naga Appani <nagnrik@gmail.com> — 2025-08-17T06:27:29Z

    Hi Michael, Ashutosh,
    
    Thanks a lot for the detailed reviews and feedback. Please find
    attached v4 of the patchset.
    
    Summary of changes in v4:
    - Split into two patches as suggested:
      1. Expose and rename ReadMultiXactCounts() -> GetMultiXactInfo() in
    multixact.h with clearer comments.
      2. Add pg_get_multixact_stats() as a SQL-callable function in a new
    file (multixactfuncs.c), with docs and tests.
    - Function now also returns oldestOffset for consistency.
    - Field names updated to num_mxids, num_members, oldest_multixact,
    oldest_offset.
    - Documentation revised to describe thresholds only in terms of member
    counts (disk size wording removed).
    - Added a minimal example in maintenance.sgml where multixact
    wraparound is already described.
    - Isolation tests are rewritten so they no longer depend on exact
    counts, but only on monotonic properties guaranteed while a multixact
    is pinned.
    
    Replies inline below:
    
    On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote:
    >
    > I really think that we should move the SQL function parts of multixact.c
    > into their own new file, exposing ReadMultiXactCounts() in multixact.h...
    
    Done. The SQL-callable code now lives in
      src/backend/utils/adt/multixactfuncs.c
    and the accessor is declared in
      src/include/access/multixact.h.
    
    > ReadMultiXactCounts() is also incorrectly named with your proposal to
    > expose oldestMultiXactId in the information returned to the caller...
    > So perhaps this should be named GetMultiXactInformation() or something
    > similar?
    
    Renamed to GetMultiXactInfo().
    
    > The top of ReadMultiXactCounts() (or whatever its new name) should
    > also document the information returned across a single call.
    
    Added detailed comments about consistency under a single LWLock and the
    meaning of each field.
    
    > It looks inconsistent to return oldestMultiXactId if the
    > oldestOffsetKnown is false. What about oldestOffset itself?
    
    GetMultiXactInfo() now returns oldestOffset as well. If the oldest
    offset isn’t currently known, the function returns false and clears
    all outputs, so callers don’t see a partially valid struct.
    
    ---
    
    On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat
    <ashutosh.bapat.oss@gmail.com> wrote:
    >
    > Would it be better to do that math in the function and output the result?
    
    That’s a cool idea, thanks for pointing it out. For now I have kept the
    SQL function focused only on exposing the raw counts (num_mxids,
    num_members, oldest IDs). My thought was that keeping the API lean makes
    it easier to maintain across versions, while leaving any derived
    calculations like approximate storage size to SQL or external tooling.
    This way the function remains simple and future-proof, while still
    giving users the building blocks to get the view they need.
    
    I’m happy to revisit this if others feel it would be better for the
    function to provide an approximate size directly — I wanted to start
    with the simplest surface and gather feedback first.
    
    > Now that the name of the function is changed, we need the names to
    > indicate that they are counts e.g. num_mxids, num_members.
    
    Adjusted. The SQL function returns: num_mxids, num_members,
    oldest_multixact, oldest_offset.
    
    > This file doesn't provide usage examples of other functions. This
    > function doesn't seem to be an exception.
    
    Earlier I thought it was fine to add an example since
    pg_input_error_info() also has one, so in this version I placed the
    example in maintenance.sgml, where we already discuss multixact
    wraparound. That seemed like the most natural place for it. I agree with
    your point about consistency, though, so I kept the style minimal and
    aligned with the surrounding text.
    
    > I think we should mention that the statistics may get stale as soon as
    > it's fetched, even with REPEATABLE READ isolation level.
    
    Added a note that values are a live snapshot and can change immediately.
    
    > In case each member starts consuming more or less space than it does
    > today what would be the basis of triggering wraparound? Space or
    > number of members? I think we should mention only that.
    
    I updated the docs to describe wraparound in terms of member counts only.
    The earlier mention of disk size has been dropped, since the thresholds
    are defined by counts.
    
    > This is the right place to elaborate the usage of this function with an
    > example.
    
    Expanded with a short example, while keeping it consistent with nearby
    entries.
    
    > ... since startup or the number of existing members?
    
    Clarified that the values reflect what’s *currently in use* (i.e.,
    derived from next/oldest) and that NULLs are returned if the multixact
    subsystem has not been initialized yet.
    
    > The last two lines are not required, I think. One of its usage is
    > monitoring but users may find other usages.
    
    Dropped those lines.
    
    > Vacuum may clean the multixact between commit2 and check, in which
    > case the result won't be stable.
    
    Right, the earlier version of the test assumed stable counts, which
    could fail if autovacuum or background cleanup removed entries in
    between steps. In v4 the isolation test no longer relies on exact
    numbers. Instead it asserts only the monotonic properties that are
    guaranteed while a multixact is pinned, and avoids assumptions once
    locks are released. That makes the test robust against concurrent vacuum
    activity.
    
    ---
    
    Thanks again for the thoughtful reviews and guidance. Please let me know
    if you see further adjustments needed.
    
    Best regards,
    Naga