Thread

Experimental patch for terminating VACUUM freeze blockers

wenhui qiu <qiuwenhuifx@gmail.com> — 2026-05-13T11:56:43Z

Hi all,

When a table’s age reaches vacuum_failsafe_age, VACUUM enters failsafe mode
and bypasses nonessential throttling such as autovacuum_vacuum_cost_delay so
it can make progress as aggressively as possible.

However, if VACUUM’s freeze horizon is blocked by an old transaction, the
failsafe behavior alone cannot reduce the table age. In that case, VACUUM
may run at full speed but still be unable to advance relfrozenxid.

I have an experimental patch to explore handling this situation. The patch
adds a GUC, vacuum_freeze_terminate_blockers_pid, which allows VACUUM to
terminate regular client backends whose transaction horizon blocks VACUUM
from advancing its freeze cutoff. The intended targets are
idle-in-transaction sessions and long-running active transactions that are
holding an old xmin or assigned XID.

One possible question is why not rely on idle_in_transaction_session_timeout.
In practice, this parameter is often not configured, and even when it is,
it only addresses idle-in-transaction sessions. It does not help with
active long-running transactions that can also hold back VACUUM’s freeze
horizon.

The patch deliberately does not try to handle other causes of freeze
horizon retention, such as replication slots, hot standby feedback, or
prepared transactions.

This is experimental and intended for discussion.

Re: Experimental patch for terminating VACUUM freeze blockers

Nathan Bossart <nathandbossart@gmail.com> — 2026-05-13T14:23:26Z

On Wed, May 13, 2026 at 07:56:43PM +0800, wenhui qiu wrote:
> I have an experimental patch to explore handling this situation. The patch
> adds a GUC, vacuum_freeze_terminate_blockers_pid, which allows VACUUM to
> terminate regular client backends whose transaction horizon blocks VACUUM
> from advancing its freeze cutoff. The intended targets are
> idle-in-transaction sessions and long-running active transactions that are
> holding an old xmin or assigned XID.

Thanks for sharing.  I certainly agree that this area has room for
improvement in Postgres.

> The patch deliberately does not try to handle other causes of freeze
> horizon retention, such as replication slots, hot standby feedback, or
> prepared transactions.

My experience is a bit dated, but I remember the two main issues being
replication slots and temporary tables.  We now have
idle_replication_slot_timeout, and there's a somewhat active thread on an
XID version of that parameter [0], but I'm not aware of any recent ideas
about how to deal with stranded temporary tables.  I'd encourage you to
think about these problems, too.

[0] https://postgr.es/m/CA%2B-JvFsMHckBMzsu5Ov9HCG3AFbMh056hHy1FiXazBRtZ9pFBg%40mail.gmail.com

-- 
nathan

Re: Experimental patch for terminating VACUUM freeze blockers

Sami Imseih <samimseih@gmail.com> — 2026-05-13T19:55:00Z

> On Wed, May 13, 2026 at 07:56:43PM +0800, wenhui qiu wrote:
> > I have an experimental patch to explore handling this situation. The patch
> > adds a GUC, vacuum_freeze_terminate_blockers_pid, which allows VACUUM to
> > terminate regular client backends whose transaction horizon blocks VACUUM
> > from advancing its freeze cutoff. The intended targets are
> > idle-in-transaction sessions and long-running active transactions that are
> > holding an old xmin or assigned XID.
>
> Thanks for sharing.  I certainly agree that this area has room for
> improvement in Postgres.

My 2c. Using something like the proposed
vacuum_freeze_terminate_blockers_pid (GUC name is misleading, since
it's a bool )
seems backwards to me. It does not address the root cause, which is
the long-running
transaction, etc and attempts to deal with the symptom rather than the problem.
This also means a poor configuration of this parameter will more
likely lead to a system
silently getting into wraparound, as a DBA may relax a bit on monitoring, maybe.

I do think we need better visibility into what is blocking vacuum, which was
discussed here [1], but ultimately it is up to the DBA to properly monitor
and mitigate workloads that are impacting their vacuum.

[1] https://www.postgresql.org/message-id/flat/CAOzEurSgy-gDtwFmEbj5%2BR9PL0_G3qYB6nnzJtNStyuf87VSVg%40mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)

Re: Experimental patch for terminating VACUUM freeze blockers

wenhui qiu <qiuwenhuifx@gmail.com> — 2026-05-14T03:59:47Z

Hi Sami

> My 2c. Using something like the proposed
> vacuum_freeze_terminate_blockers_pid (GUC name is misleading, since
> it's a bool )
> seems backwards to me. It does not address the root cause, which is
> the long-running
> transaction, etc and attempts to deal with the symptom rather than the
problem.
> This also means a poor configuration of this parameter will more
> likely lead to a system
> silently getting into wraparound, as a DBA may relax a bit on monitoring,
maybe.

Thanks for your comments. I agree that the real problem is the long-running
transaction or other blocker, and this patch is not meant to replace
monitoring or fixing that root cause. The idea was inspired by the
replay/query conflict mechanism. Normally we do not want to cancel
long-running queries that block replay either; alternatively, we can enable
hot_standby_feedback, but that brings other problems. So this is a
trade-off left to the DBA. I think this  is similar: the right solution is
still to prevent xid wraparound by managing the blockers, but when the
system is already approaching wraparound, terminating blockers may be
preferable to letting the system continue toward a dangerous state. So this
is intended as a last-resort option, not as the primary solution.

> I do think we need better visibility into what is blocking vacuum, which
was
> discussed here [1], but ultimately it is up to the DBA to properly monitor
> and mitigate workloads that are impacting their vacuum.

> [1]
https://www.postgresql.org/message-id/flat/CAOzEurSgy-gDtwFmEbj5%2BR9PL0_G3qYB6nnzJtNStyuf87VSVg%40mail.gmail.com

I'm in this thread,I know you're good at adding relevant system views.
Often, checking system views is more convenient than looking at logs. If
this request is approved, I’d appreciate your help in implementing the
corresponding system views.
.

Thanks

On Thu, May 14, 2026 at 3:55 AM Sami Imseih <samimseih@gmail.com> wrote:

> > On Wed, May 13, 2026 at 07:56:43PM +0800, wenhui qiu wrote:
> > > I have an experimental patch to explore handling this situation. The
> patch
> > > adds a GUC, vacuum_freeze_terminate_blockers_pid, which allows VACUUM
> to
> > > terminate regular client backends whose transaction horizon blocks
> VACUUM
> > > from advancing its freeze cutoff. The intended targets are
> > > idle-in-transaction sessions and long-running active transactions that
> are
> > > holding an old xmin or assigned XID.
> >
> > Thanks for sharing.  I certainly agree that this area has room for
> > improvement in Postgres.
>
> My 2c. Using something like the proposed
> vacuum_freeze_terminate_blockers_pid (GUC name is misleading, since
> it's a bool )
> seems backwards to me. It does not address the root cause, which is
> the long-running
> transaction, etc and attempts to deal with the symptom rather than the
> problem.
> This also means a poor configuration of this parameter will more
> likely lead to a system
> silently getting into wraparound, as a DBA may relax a bit on monitoring,
> maybe.
>
> I do think we need better visibility into what is blocking vacuum, which
> was
> discussed here [1], but ultimately it is up to the DBA to properly monitor
> and mitigate workloads that are impacting their vacuum.
>
> [1]
> https://www.postgresql.org/message-id/flat/CAOzEurSgy-gDtwFmEbj5%2BR9PL0_G3qYB6nnzJtNStyuf87VSVg%40mail.gmail.com
>
> --
> Sami Imseih
> Amazon Web Services (AWS)
>