Thread

  1. Report oldest xmin source when autovacuum cannot remove tuples

    Shinya Kato <shinya11.kato@gmail.com> — 2025-10-31T06:31:40Z

    Hi hackers,
    
    I am proposing to add the reason for the oldest xmin to VACUUM logs.
    This feature would be useful for identifying why dead tuples cannot be
    removed, thereby helping to diagnose and prevent table bloat.
    
    The current logs only indicate that dead tuples could not be reclaimed
    due to the oldest xmin, but they do not reveal the underlying reason.
    To identify the cause, it is necessary to query multiple views:
    pg_stat_activity (for active transactions), pg_prepared_xacts (for
    prepared statements), pg_replication_slots (for replication slots),
    and pg_stat_replication (for hot standby feedback). However, because
    the data in these views is volatile, it is difficult to retroactively
    determine what was holding the oldest xmin at the specific time the
    log message was generated.
    
    This PoC patch addresses this problem. The implementation now outputs
    the reason for the oldest xmin and, where applicable, the backend PID.
    This information was originally discarded when calculating the oldest
    xmin horizon, and the computation required to retrieve these reasons
    is considered reasonable.
    
    The patch is attached. What do you think?
    
    -- 
    Best regards,
    Shinya Kato
    NTT OSS Center
    
  2. Re: Report oldest xmin source when autovacuum cannot remove tuples

    wenhui qiu <qiuwenhuifx@gmail.com> — 2025-10-31T08:00:54Z

    HI
      Thank you for your path ,This path is extremely helpful.
    > +/*
    > + * Identifies what determined a relation's OldestXmin horizon.
    > + * Used by autovacuum to report why dead tuples were not removable.
    > + */
    > +typedef enum OldestXminSource
    > +{
    > + OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
    > + OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
    > + OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
    > + OLDESTXMIN_SOURCE_REPLICATION_SLOT,
    > + OLDESTXMIN_SOURCE_OTHER
    > +} OldestXminSource;
    > +
    > +typedef struct OldestXminInfo
    > +{
    > + OldestXminSource source;
    > + int backend_pid;
    > +} OldestXminInfo;
    I have a question for like this
    one session
    begin;
    select * from table_a
    not commit or not closed session
    It is in OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION type ?
    
    
    
    Thank
    
    On Fri, Oct 31, 2025 at 2:32 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
    
    > Hi hackers,
    >
    > I am proposing to add the reason for the oldest xmin to VACUUM logs.
    > This feature would be useful for identifying why dead tuples cannot be
    > removed, thereby helping to diagnose and prevent table bloat.
    >
    > The current logs only indicate that dead tuples could not be reclaimed
    > due to the oldest xmin, but they do not reveal the underlying reason.
    > To identify the cause, it is necessary to query multiple views:
    > pg_stat_activity (for active transactions), pg_prepared_xacts (for
    > prepared statements), pg_replication_slots (for replication slots),
    > and pg_stat_replication (for hot standby feedback). However, because
    > the data in these views is volatile, it is difficult to retroactively
    > determine what was holding the oldest xmin at the specific time the
    > log message was generated.
    >
    > This PoC patch addresses this problem. The implementation now outputs
    > the reason for the oldest xmin and, where applicable, the backend PID.
    > This information was originally discarded when calculating the oldest
    > xmin horizon, and the computation required to retrieve these reasons
    > is considered reasonable.
    >
    > The patch is attached. What do you think?
    >
    > --
    > Best regards,
    > Shinya Kato
    > NTT OSS Center
    >
    
  3. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Shinya Kato <shinya11.kato@gmail.com> — 2025-11-04T01:29:26Z

    On Fri, Oct 31, 2025 at 5:01 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
    >
    > HI
    >   Thank you for your path ,This path is extremely helpful.
    
    Thank you!
    
    > > +/*
    > > + * Identifies what determined a relation's OldestXmin horizon.
    > > + * Used by autovacuum to report why dead tuples were not removable.
    > > + */
    > > +typedef enum OldestXminSource
    > > +{
    > > + OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
    > > + OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
    > > + OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
    > > + OLDESTXMIN_SOURCE_REPLICATION_SLOT,
    > > + OLDESTXMIN_SOURCE_OTHER
    > > +} OldestXminSource;
    > > +
    > > +typedef struct OldestXminInfo
    > > +{
    > > + OldestXminSource source;
    > > + int backend_pid;
    > > +} OldestXminInfo;
    > I have a question for like this
    > one session
    > begin;
    > select * from table_a
    > not commit or not closed session
    > It is in OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION type ?
    
    Exactly. Looking at 010_autovacuum_oldest_xmin_reason.pl should make
    it clear which logs are output in which cases. (I just noticed there
    seems to be a case where the test fails. I need to fix that.)
    
    -- 
    Best regards,
    Shinya Kato
    NTT OSS Center
    
    
    
    
  4. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Fujii Masao <masao.fujii@gmail.com> — 2025-11-04T15:58:51Z

    On Fri, Oct 31, 2025 at 3:32 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
    >
    > Hi hackers,
    >
    > I am proposing to add the reason for the oldest xmin to VACUUM logs.
    > This feature would be useful for identifying why dead tuples cannot be
    > removed, thereby helping to diagnose and prevent table bloat.
    
    +1
    
    I like this idea. Thanks for working on this!
    
    
    > This PoC patch addresses this problem. The implementation now outputs
    > the reason for the oldest xmin and, where applicable, the backend PID.
    > This information was originally discarded when calculating the oldest
    > xmin horizon, and the computation required to retrieve these reasons
    > is considered reasonable.
    >
    > The patch is attached. What do you think?
    
    According to cfbot, the 010_autovacuum_oldest_xmin_reason regression test
    passes on some platforms but fails on others (see [1]), so it doesn't
    appear stable.
    
    
    When I set up a primary and standby with hot_standby_feedback enabled,
    then created an old prepared transaction expected to prevent dead tuples
    from being vacuumed, VACUUM VERBOSE reported "hot standby feedback"
    instead of "prepared transaction" as the oldest xmin source. This isn't a bug
    since both xmins are the same in this case. But it may be confusing?
    Would it be better to report "prepared transaction" in such cases?
    
    
    + case OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION:
    + msgfmt = include_pid ?
    + _("oldest xmin source: active transaction (pid=%d)\n") :
    + _("oldest xmin source: active transaction\n");
    + break;
    + case OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK:
    + msgfmt = include_pid ?
    + _("oldest xmin source: hot standby feedback (pid=%d)\n") :
    + _("oldest xmin source: hot standby feedback\n");
    
    In these two cases, the PID should always be non-zero, so the message
    formats without (pid=%d) might not be necessary.
    
    
    + /* Identify what forced each of the horizons above. */
    + OldestXminInfo shared_oldest_nonremovable_info;
    + OldestXminInfo catalog_oldest_nonremovable_info;
    + OldestXminInfo data_oldest_nonremovable_info;
    + OldestXminInfo temp_oldest_nonremovable_info;
    
    It might be good to add a comment explaining why we track
    sources only for these four oldest xmins, and not for others
    like oldest_considered_running.
    
    
    + TransactionId old;
    + TransactionId new_horizon;
    +
    + if (!TransactionIdIsValid(candidate))
    + return;
    
    The TransactionIdIsValid(candidate) check may be redundant,
    since TransactionIdOlder(old, candidate) already performs
    the same validation.
    
    
    - switch (GlobalVisHorizonKindForRel(rel))
    + kind = GlobalVisHorizonKindForRel(rel);
    + switch (kind)
    
    This change doesn't seem necessary.
    
    Regards,
    
    [1] https://cirrus-ci.com/task/6063548834512896
    
    -- 
    Fujii Masao
    
    
    
    
  5. Re: Report oldest xmin source when autovacuum cannot remove tuples

    wenhui qiu <qiuwenhuifx@gmail.com> — 2025-11-11T10:40:04Z

    Hi
    Thank you for your response , In this context, that session  state is idle
    in transaction .If we could further distinguish between active sessions
    that are still running and idle sessions, that would be clearer.
    
    
    
    
    Thanks
    
    On Tue, Nov 4, 2025 at 11:59 PM Fujii Masao <masao.fujii@gmail.com> wrote:
    
    > On Fri, Oct 31, 2025 at 3:32 PM Shinya Kato <shinya11.kato@gmail.com>
    > wrote:
    > >
    > > Hi hackers,
    > >
    > > I am proposing to add the reason for the oldest xmin to VACUUM logs.
    > > This feature would be useful for identifying why dead tuples cannot be
    > > removed, thereby helping to diagnose and prevent table bloat.
    >
    > +1
    >
    > I like this idea. Thanks for working on this!
    >
    >
    > > This PoC patch addresses this problem. The implementation now outputs
    > > the reason for the oldest xmin and, where applicable, the backend PID.
    > > This information was originally discarded when calculating the oldest
    > > xmin horizon, and the computation required to retrieve these reasons
    > > is considered reasonable.
    > >
    > > The patch is attached. What do you think?
    >
    > According to cfbot, the 010_autovacuum_oldest_xmin_reason regression test
    > passes on some platforms but fails on others (see [1]), so it doesn't
    > appear stable.
    >
    >
    > When I set up a primary and standby with hot_standby_feedback enabled,
    > then created an old prepared transaction expected to prevent dead tuples
    > from being vacuumed, VACUUM VERBOSE reported "hot standby feedback"
    > instead of "prepared transaction" as the oldest xmin source. This isn't a
    > bug
    > since both xmins are the same in this case. But it may be confusing?
    > Would it be better to report "prepared transaction" in such cases?
    >
    >
    > + case OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION:
    > + msgfmt = include_pid ?
    > + _("oldest xmin source: active transaction (pid=%d)\n") :
    > + _("oldest xmin source: active transaction\n");
    > + break;
    > + case OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK:
    > + msgfmt = include_pid ?
    > + _("oldest xmin source: hot standby feedback (pid=%d)\n") :
    > + _("oldest xmin source: hot standby feedback\n");
    >
    > In these two cases, the PID should always be non-zero, so the message
    > formats without (pid=%d) might not be necessary.
    >
    >
    > + /* Identify what forced each of the horizons above. */
    > + OldestXminInfo shared_oldest_nonremovable_info;
    > + OldestXminInfo catalog_oldest_nonremovable_info;
    > + OldestXminInfo data_oldest_nonremovable_info;
    > + OldestXminInfo temp_oldest_nonremovable_info;
    >
    > It might be good to add a comment explaining why we track
    > sources only for these four oldest xmins, and not for others
    > like oldest_considered_running.
    >
    >
    > + TransactionId old;
    > + TransactionId new_horizon;
    > +
    > + if (!TransactionIdIsValid(candidate))
    > + return;
    >
    > The TransactionIdIsValid(candidate) check may be redundant,
    > since TransactionIdOlder(old, candidate) already performs
    > the same validation.
    >
    >
    > - switch (GlobalVisHorizonKindForRel(rel))
    > + kind = GlobalVisHorizonKindForRel(rel);
    > + switch (kind)
    >
    > This change doesn't seem necessary.
    >
    > Regards,
    >
    > [1] https://cirrus-ci.com/task/6063548834512896
    >
    > --
    > Fujii Masao
    >
    >
    >
    
  6. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Sami Imseih <samimseih@gmail.com> — 2025-11-15T00:25:36Z

    Thanks for starting this thread! This is a very useful
    feature that users will find beneficial to easily narrow
    down the reason the xmin horizon is being held back,
    and take action.
    
    Adding this information to the vacuum logging is useful, but
    I can see this information being exposed in a view as well in
    the future.
    
    I have a few comments:
    
    A few minor ones:
    
    1/ pid should be declared as "pid_t"
    
    2/ last value of an enum should be have a traling comma
    +typedef enum OldestXminSource
    +{
    +       OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
    +       OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
    +       OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
    +       OLDESTXMIN_SOURCE_REPLICATION_SLOT,
    +       OLDESTXMIN_SOURCE_OTHER
    +} OldestXminSource;
    
    More importantly:
    
    3/ As mentioned earlier in the thread, the "idle-in-transaction"
    transactions is not being reported correctly, particularly for write
    tansactions. I think that is an important missing case. The reason
    for this is the cutoff xmin is not being looked up against the current
    list of xid's, so we are not blaming the correct pid.
    
    4/
    Thinking about point 3 above, I began to wonder if this
    whole thing can be simplified with inspiration. Looking at the
    existing  BackendXidGetPid(), I think it can.
    
    Based on BackendXidGetPid(), I tried a new routine called
    BackendXidFindCutOffReason() which can take in the cutoff xmin,
    passed in by vacuum and can walk though the proc array and
    determine the reason. We don't need to touch ComputeXidHorizons()
    to make this work, it seems to me. This comes with an additional
    walk though the procarray holding a shared lock, but I don't think
    this will be an issue.
    
    Attached is a rough sketch of BackendXidFindCutOffReason()
    For now, I just added NOTICE messages which will log with
    VACUUM (verbose) for testing.
    
    This takes what you are doing in v1 inside ComputeXidHorizons()
    into a new routine. I think this is a cleaner approach.
    
    5/ Also, I think we should also include tests for serializable
    transactions
    
    
    What do you think?
    
    --
    
    Sami Imseih
    Amazon Web Services (AWS)
    
  7. Re: Report oldest xmin source when autovacuum cannot remove tuples

    wenhui qiu <qiuwenhuifx@gmail.com> — 2025-11-17T01:43:05Z

    Hi Sami
    
    > Thinking about point 3 above, I began to wonder if this
    > whole thing can be simplified with inspiration. Looking at the
    > existing  BackendXidGetPid(), I think it can.
    
    > Based on BackendXidGetPid(), I tried a new routine called
    > BackendXidFindCutOffReason() which can take in the cutoff xmin,
    > passed in by vacuum and can walk though the proc array and
    > determine the reason. We don't need to touch ComputeXidHorizons()
    > to make this work, it seems to me. This comes with an additional
    > walk though the procarray holding a shared lock, but I don't think
    > this will be an issue.
    
    > Attached is a rough sketch of BackendXidFindCutOffReason()
    > For now, I just added NOTICE messages which will log with
    > VACUUM (verbose) for testing.
    I like your idea , I think we also could consider introducing a GUC
    parameter in the future, which would terminate sessions blocking vacuum
    operations when the table's age reaches vacuum_failsafe_age.
    
    
    Thanks
    
    On Sat, Nov 15, 2025 at 8:25 AM Sami Imseih <samimseih@gmail.com> wrote:
    
    > Thanks for starting this thread! This is a very useful
    > feature that users will find beneficial to easily narrow
    > down the reason the xmin horizon is being held back,
    > and take action.
    >
    > Adding this information to the vacuum logging is useful, but
    > I can see this information being exposed in a view as well in
    > the future.
    >
    > I have a few comments:
    >
    > A few minor ones:
    >
    > 1/ pid should be declared as "pid_t"
    >
    > 2/ last value of an enum should be have a traling comma
    > +typedef enum OldestXminSource
    > +{
    > +       OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
    > +       OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
    > +       OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
    > +       OLDESTXMIN_SOURCE_REPLICATION_SLOT,
    > +       OLDESTXMIN_SOURCE_OTHER
    > +} OldestXminSource;
    >
    > More importantly:
    >
    > 3/ As mentioned earlier in the thread, the "idle-in-transaction"
    > transactions is not being reported correctly, particularly for write
    > tansactions. I think that is an important missing case. The reason
    > for this is the cutoff xmin is not being looked up against the current
    > list of xid's, so we are not blaming the correct pid.
    >
    > 4/
    > Thinking about point 3 above, I began to wonder if this
    > whole thing can be simplified with inspiration. Looking at the
    > existing  BackendXidGetPid(), I think it can.
    >
    > Based on BackendXidGetPid(), I tried a new routine called
    > BackendXidFindCutOffReason() which can take in the cutoff xmin,
    > passed in by vacuum and can walk though the proc array and
    > determine the reason. We don't need to touch ComputeXidHorizons()
    > to make this work, it seems to me. This comes with an additional
    > walk though the procarray holding a shared lock, but I don't think
    > this will be an issue.
    >
    > Attached is a rough sketch of BackendXidFindCutOffReason()
    > For now, I just added NOTICE messages which will log with
    > VACUUM (verbose) for testing.
    >
    > This takes what you are doing in v1 inside ComputeXidHorizons()
    > into a new routine. I think this is a cleaner approach.
    >
    > 5/ Also, I think we should also include tests for serializable
    > transactions
    >
    >
    > What do you think?
    >
    > --
    >
    > Sami Imseih
    > Amazon Web Services (AWS)
    >
    
  8. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Dilip Kumar <dilipbalaut@gmail.com> — 2025-11-17T02:51:04Z

    On Sat, Nov 15, 2025 at 5:56 AM Sami Imseih <samimseih@gmail.com> wrote:
    >
    > Thanks for starting this thread! This is a very useful
    > feature that users will find beneficial to easily narrow
    > down the reason the xmin horizon is being held back,
    > and take action.
    
    +1 for the idea.  In BackendXidFindCutOffReason() you have directly
    reported using NOTICE I believe that is just to show the idea and you
    are planning to append this to the main message?  Apart from that we
    are looping the whole pgprocarray, however it is only done when we are
    vacuuming with verbose mode so might not be that bad.
    
    -- 
    Regards,
    Dilip Kumar
    Google
    
    
    
    
  9. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Sami Imseih <samimseih@gmail.com> — 2025-11-17T18:35:46Z

    > > Thanks for starting this thread! This is a very useful
    > > feature that users will find beneficial to easily narrow
    > > down the reason the xmin horizon is being held back,
    > > and take action.
    >
    > +1 for the idea.  In BackendXidFindCutOffReason() you have directly
    > reported using NOTICE I believe that is just to show the idea and you
    > are planning to append this to the main message?
    
    Yes, the NOTICE is just for demonstration. Some more work is needed
    to integrate the output into the vacuum log.
    
    > Apart from that we are looping the whole pgprocarray, however it is
    > only done when we are vacuuming with verbose mode so might not
    > be that bad.
    
    The extra procarray loop will occur during VACUUM VERBOSE
    or when a vacuum exceeds log_autovacuum_min_duration, 10
    minutes by default.
    
    I do think however, we should be more selective when to
    do this work. We should only care about emitting this
    information in autovacuum logging if the cutoffs->OldestXmid
    did not advance between consecutive vacuums. This will
    mean we we will need to track the last cutoff value in relation stats
    (pg_stat_user_tables), but I think having this value in
    stats will be useful on its own actually, as users can use
    it to track tables with stalled cutoffs even if they don't
    have sufficient autovacuum logging.
    
    What do you think?
    
    --
    Sami Imseih
    Amazon Web Services (AWS)
    
    
    
    
  10. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Andres Freund <andres@anarazel.de> — 2025-12-18T00:30:04Z

    Hi,
    
    On 2025-10-31 15:31:40 +0900, Shinya Kato wrote:
    > The patch is attached. What do you think?
    
    The added tests never seem to pass in CI:
    https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F6188
    
    Greetings,
    
    Andres Freund
    
    
    
    
  11. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Shinya Kato <shinya11.kato@gmail.com> — 2025-12-26T05:34:01Z

    Thank you all for the review comments, and sorry for the late reply.
    I will address the review comments in order.
    
    On Sat, Nov 15, 2025 at 9:25 AM Sami Imseih <samimseih@gmail.com> wrote:
    > More importantly:
    >
    > 3/ As mentioned earlier in the thread, the "idle-in-transaction"
    > transactions is not being reported correctly, particularly for write
    > tansactions. I think that is an important missing case. The reason
    > for this is the cutoff xmin is not being looked up against the current
    > list of xid's, so we are not blaming the correct pid.
    >
    > 4/
    > Thinking about point 3 above, I began to wonder if this
    > whole thing can be simplified with inspiration. Looking at the
    > existing  BackendXidGetPid(), I think it can.
    >
    > Based on BackendXidGetPid(), I tried a new routine called
    > BackendXidFindCutOffReason() which can take in the cutoff xmin,
    > passed in by vacuum and can walk though the proc array and
    > determine the reason. We don't need to touch ComputeXidHorizons()
    > to make this work, it seems to me. This comes with an additional
    > walk though the procarray holding a shared lock, but I don't think
    > this will be an issue.
    >
    > Attached is a rough sketch of BackendXidFindCutOffReason()
    > For now, I just added NOTICE messages which will log with
    > VACUUM (verbose) for testing.
    
    Thanks for the revised proposal! Your approach is clear and makes the
    code easier to read. However, I’m hesitant to proceed with this idea
    for the following reasons:
    
    - The original proposal extends ComputeXidHorizons(), which is always
    calculated, so there is almost no additional overhead.
    
    - Your proposal incurs additional cost. Furthermore, the time lag
    between the execution of ComputeXidHorizons() and
    BackendXidFindCutOffReason() could lead to inaccurate logging.
    
    - I don't believe it is necessary to distinguish between active
    transactions and "idle in transaction." These states can change
    rapidly, and as long as we have the PID, we can check the current
    status via pg_stat_activity.
    
    - Your comment made me realize that it might be appropriate to expose
    the oldest xmin in the pg_stat_{all,user,sys}_tables views, rather
    than just logging it. In that case, we would need to calculate the
    oldest xmin horizon every time. This might be a topic for a separate
    thread, but we could consider adding columns such as:
      - pg_stat_{all,user,sys}_tables.last_vacuum_oldest_xmin (xid)
      - pg_stat_{all,user,sys}_tables.last_vacuum_oldest_xmin_source (text)
    
    
    
    --
    Best regards,
    Shinya Kato
    NTT OSS Center
    
    
    
    
  12. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Shinya Kato <shinya11.kato@gmail.com> — 2026-05-22T13:05:31Z

    Thank you for your feedback!
    
    On Mon, Mar 16, 2026 at 8:19 PM Japin Li <japinli@hotmail.com> wrote:
    >
    > On Mon, 16 Mar 2026 at 15:59, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
    > > HI Shinya
    > >> typedef enum XidHorizonBlockerType
    > >> {
    > >>     XHB_NONE = 0,
    > >>     XHB_ACTIVE_TRANSACTION,
    > >>     XHB_IDLE_IN_TRANSACTION,
    > >>     XHB_PREPARED_TRANSACTION,
    > >>     XHB_XMIN_ACTIVE_TRANSACTION,
    > >>     XHB_XMIN_IDLE_IN_TRANSACTION,
    > >>     XHB_HOT_STANDBY_FEEDBACK,
    > >>     XHB_REPLICATION_SLOT,
    > >> }
    > > Thank you for your working on this ,I have another small suggestion
    > > The priority ordering encoded in XidHorizonBlockerType determines which blocker gets reported when multiple candidates
    > > exist. In particular:
    > >
    > > ACTIVE_TRANSACTION
    > > IDLE_IN_TRANSACTION
    > > PREPARED_TRANSACTION
    > >
    > > Prepared transactions are currently ranked after idle-in-transaction sessions. Operationally, prepared transactions are
    > > often harder for DBAs to resolve than idle sessions, so it might be worth clarifying the rationale behind this ordering
    > > or reconsidering whether prepared transactions should have higher priority.
    >
    > Agreed.  Explaining the reason for this priority is very helpful.
    
    We always pick a blocker from the xid-match group first (it is the
    transaction actually holding the horizon, while the xmin-match entries
    are just held back by it). Within the xid-match group, the
    active/idle/prepared order never matters: a given xid is owned by only
    one backend, so when the horizon equals a proc's xid there is only one
    matching entry, and it is exactly one of active, idle, or prepared. So
    moving prepared ahead of idle would not change which blocker we
    report.
    
    > >> typedef enum XidHorizonBlockerType
    > >> {
    > >>     XHB_NONE = 0,
    > >>     XHB_ACTIVE_TRANSACTION,
    > >>     XHB_PREPARED_TRANSACTION,
    > >>     XHB_IDLE_IN_TRANSACTION,
    > >>     XHB_XMIN_ACTIVE_TRANSACTION,
    > >>     XHB_XMIN_IDLE_IN_TRANSACTION,
    > >>     XHB_HOT_STANDBY_FEEDBACK,
    > >>     XHB_REPLICATION_SLOT,
    > >> }
    > > Another one:
    > > Currently GetXidHorizonBlocker() selects only one blocker (based on the enum priority) even though multiple independent
    > > sources could hold back the xmin horizon simultaneously. For example, it is possible to have both a prepared transaction
    > > and a replication slot preventing the horizon from advancing.
    > > Have you considered reporting all detected blockers instead of just the highest-priority one? Returning only a single
    > > entry might hide other relevant blockers from the user.
    > >
    >
    > I'm also curious — why don't we list all the blockers? Did I miss anything?
    
    I did think about this, but I would like to keep reporting one blocker
    in the VACUUM log, for two reasons.
    
    First, the log can get very large. In Sami's earlier example [0], a
    pgbench run had many backends all sharing the same xmin while only one
    idle-in-transaction backend actually owned the cutoff xid. Reporting
    every blocker would print 20+ lines, almost all of them just victims
    of the same root cause, which makes the log harder to read, not
    easier.
    
    Second, the one blocker we report is the root cause (the xid owner).
    Once the DBA resolves it, the next VACUUM will show the next blocker
    if one remains.
    
    This is also why the code is split into GetXidHorizonBlockers(), which
    already collects every candidate, and GetXidHorizonBlocker(), which
    picks the highest-priority one for the log. The "show everything" case
    is what I would like to expose later through a dynamic statistics
    view, where a full list makes more sense than in a VACUUM log line.
    
    
    I've rebased the patch.
    
    P.S. It might be better to use bottom posting instead of top posting [1].
    
    [0] https://www.postgresql.org/message-id/CAA5RZ0sjMgMo4Xg-niyyF-CpkQ_CK6uOfNKYT%3D9RmiBkAxQkbQ%40mail.gmail.com
    [1] https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics
    
    -- 
    Best regards,
    Shinya Kato
    NTT OSS Center
    
  13. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Japin Li <japinli@hotmail.com> — 2026-05-23T01:40:39Z

    On Fri, 22 May 2026 at 22:05, Shinya Kato <shinya11.kato@gmail.com> wrote:
    > Thank you for your feedback!
    >
    > On Mon, Mar 16, 2026 at 8:19 PM Japin Li <japinli@hotmail.com> wrote:
    >>
    >> On Mon, 16 Mar 2026 at 15:59, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
    >> > HI Shinya
    >> >> typedef enum XidHorizonBlockerType
    >> >> {
    >> >>     XHB_NONE = 0,
    >> >>     XHB_ACTIVE_TRANSACTION,
    >> >>     XHB_IDLE_IN_TRANSACTION,
    >> >>     XHB_PREPARED_TRANSACTION,
    >> >>     XHB_XMIN_ACTIVE_TRANSACTION,
    >> >>     XHB_XMIN_IDLE_IN_TRANSACTION,
    >> >>     XHB_HOT_STANDBY_FEEDBACK,
    >> >>     XHB_REPLICATION_SLOT,
    >> >> }
    >> > Thank you for your working on this ,I have another small suggestion
    >> > The priority ordering encoded in XidHorizonBlockerType determines which blocker gets reported when multiple candidates
    >> > exist. In particular:
    >> >
    >> > ACTIVE_TRANSACTION
    >> > IDLE_IN_TRANSACTION
    >> > PREPARED_TRANSACTION
    >> >
    >> > Prepared transactions are currently ranked after idle-in-transaction sessions. Operationally, prepared transactions are
    >> > often harder for DBAs to resolve than idle sessions, so it might be worth clarifying the rationale behind this ordering
    >> > or reconsidering whether prepared transactions should have higher priority.
    >>
    >> Agreed.  Explaining the reason for this priority is very helpful.
    >
    > We always pick a blocker from the xid-match group first (it is the
    > transaction actually holding the horizon, while the xmin-match entries
    > are just held back by it). Within the xid-match group, the
    > active/idle/prepared order never matters: a given xid is owned by only
    > one backend, so when the horizon equals a proc's xid there is only one
    > matching entry, and it is exactly one of active, idle, or prepared. So
    > moving prepared ahead of idle would not change which blocker we
    > report.
    >
    >> >> typedef enum XidHorizonBlockerType
    >> >> {
    >> >>     XHB_NONE = 0,
    >> >>     XHB_ACTIVE_TRANSACTION,
    >> >>     XHB_PREPARED_TRANSACTION,
    >> >>     XHB_IDLE_IN_TRANSACTION,
    >> >>     XHB_XMIN_ACTIVE_TRANSACTION,
    >> >>     XHB_XMIN_IDLE_IN_TRANSACTION,
    >> >>     XHB_HOT_STANDBY_FEEDBACK,
    >> >>     XHB_REPLICATION_SLOT,
    >> >> }
    >> > Another one:
    >> > Currently GetXidHorizonBlocker() selects only one blocker (based on the enum priority) even though multiple independent
    >> > sources could hold back the xmin horizon simultaneously. For example, it is possible to have both a prepared transaction
    >> > and a replication slot preventing the horizon from advancing.
    >> > Have you considered reporting all detected blockers instead of just the highest-priority one? Returning only a single
    >> > entry might hide other relevant blockers from the user.
    >> >
    >>
    >> I'm also curious — why don't we list all the blockers? Did I miss anything?
    >
    > I did think about this, but I would like to keep reporting one blocker
    > in the VACUUM log, for two reasons.
    >
    > First, the log can get very large. In Sami's earlier example [0], a
    > pgbench run had many backends all sharing the same xmin while only one
    > idle-in-transaction backend actually owned the cutoff xid. Reporting
    > every blocker would print 20+ lines, almost all of them just victims
    > of the same root cause, which makes the log harder to read, not
    > easier.
    >
    > Second, the one blocker we report is the root cause (the xid owner).
    > Once the DBA resolves it, the next VACUUM will show the next blocker
    > if one remains.
    >
    > This is also why the code is split into GetXidHorizonBlockers(), which
    > already collects every candidate, and GetXidHorizonBlocker(), which
    > picks the highest-priority one for the log. The "show everything" case
    > is what I would like to expose later through a dynamic statistics
    > view, where a full list makes more sense than in a VACUUM log line.
    >
    >
    > I've rebased the patch.
    >
    
    Thanks for updating the patch. LGTM. Just one nitpick.
    
    +	int			pid;			/* backend pid (0 for slots) */
    
    Per the code, the prepared transaction is also associated with a PID of zero.
    
    -- 
    Regards,
    Japin Li
    ChengDu WenWu Information Technology Co., Ltd.
    
    
    
    
  14. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Shinya Kato <shinya11.kato@gmail.com> — 2026-05-23T05:22:48Z

    On Sat, May 23, 2026 at 10:40 AM Japin Li <japinli@hotmail.com> wrote:
    > Thanks for updating the patch. LGTM. Just one nitpick.
    >
    > +       int                     pid;                    /* backend pid (0 for slots) */
    >
    > Per the code, the prepared transaction is also associated with a PID of zero.
    
    Thanks for the review! You're right, prepared transactions also have
    pid 0. Fixed the comment to mention that.
    
    Additionally, I forgot to update meson.build, which caused the tests
    to fail. I have fixed that in the attached patch.
    
    -- 
    Best regards,
    Shinya Kato
    NTT OSS Center
    
  15. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Shinya Kato <shinya11.kato@gmail.com> — 2026-05-23T06:00:33Z

    On Sat, May 23, 2026 at 2:22 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
    > Additionally, I forgot to update meson.build, which caused the tests
    > to fail. I have fixed that in the attached patch.
    
    Oops, I made a slight mistake. Fixed.
    
    -- 
    Best regards,
    Shinya Kato
    NTT OSS Center
    
  16. Re: Report oldest xmin source when autovacuum cannot remove tuples

    wenhui qiu <qiuwenhuifx@gmail.com> — 2026-05-23T10:17:26Z

    Hi
    Thank you updating the path,LGTM
    
    
    
    
    
    Thanks
    
  17. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Laurenz Albe <laurenz.albe@cybertec.at> — 2026-05-23T13:35:37Z

    On Sat, 2026-05-23 at 15:00 +0900, Shinya Kato wrote:
    > Oops, I made a slight mistake. Fixed.
    
    I think that is a great feature that will improve users' lives!
    
    The patch looks fine to me too.
    I guess the overhead of one more scan of the process array for every autovacuum
    run if "log_autovacuum_min_duration" is non-zero (which is the default)
    is acceptable.
    
    Yours,
    Laurenz Albe
    
    
    
    
  18. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Shinya Kato <shinya11.kato@gmail.com> — 2026-05-28T05:34:54Z

    On Sat, May 23, 2026 at 10:35 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
    > The patch looks fine to me too.
    
    Thanks all for the reviews and LGTMs!
    
    Before moving this forward, I noticed there is one thing in the patch
    that I should fix first.
    
    When hot_standby_feedback=on, the location where the standby's xmin is
    recorded on the primary depends on whether physical replication slot
    is used:
    
    - Without a replication slot: the xmin is held on the walsender's PGPROC
    - With a replication slot: the xmin is held on the replication slot itself
    
    I summarized this behavior in my blog:
    https://dev.to/shinyakato_/4-causes-of-table-bloat-in-postgresql-and-how-to-address-them-3ec9
    
    The current patch reports the latter case as XHB_REPLICATION_SLOT with
    the message "logical replication slot", which is misleading because it
    is actually a physical slot used by a standby with
    hot_standby_feedback=on. I will fix this and post an updated patch
    later.
    
    
    
    
    --
    Best regards,
    Shinya Kato
    NTT OSS Center
    
    
    
    
  19. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Scott Ray <scott@scottray.io> — 2026-05-30T02:31:27Z

    > I guess the overhead of one more scan of the process array for every autovacuum
    > run if "log_autovacuum_min_duration" is non-zero (which is the default)
    > is acceptable.
    
    
    Could vacuum compute the blocker during ComputeXidHorizons and consume it at log time?
    Avoids the extra scan, and binds the blocker to the horizon vacuum used for pruning.
    
    Scott Ray
  20. Re: Report oldest xmin source when autovacuum cannot remove tuples

    Shinya Kato <shinya11.kato@gmail.com> — 2026-05-31T13:06:36Z

    On Sat, May 30, 2026 at 11:31 AM Scott Ray <scott@scottray.io> wrote:
    >
    > > I guess the overhead of one more scan of the process array for every autovacuum
    > > run if "log_autovacuum_min_duration" is non-zero (which is the default)
    > > is acceptable.
    >
    >
    > Could vacuum compute the blocker during ComputeXidHorizons and consume it at log time?
    > Avoids the extra scan, and binds the blocker to the horizon vacuum used for pruning.
    
    As Sami suggested in [0], I think this design is a cleaner approach
    because it separates the responsibilities of ComputeXidHorizons and
    the calculation of vacuum blockers.
    
    [0] https://www.postgresql.org/message-id/CAA5RZ0s%2BUUXekbeGcC-H71kW%3DMfeaUCOV%3DyEWX94NXViO2-%3DpA%40mail.gmail.com
    
    
    -- 
    Best regards,
    Shinya Kato
    NTT OSS Center