Re: another autovacuum scheduling thread

David Rowley <dgrowleyml@gmail.com>

From: David Rowley <dgrowleyml@gmail.com>

To: wenhui qiu <qiuwenhuifx@gmail.com>

Cc: Nathan Bossart <nathandbossart@gmail.com>, Sami Imseih <samimseih@gmail.com>, Robert Haas <robertmhaas@gmail.com>, Jeremy Schneider <schneider@ardentperf.com>, pgsql-hackers@postgresql.org

Date: 2025-10-30T03:41:56Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add rudimentary table prioritization to autovacuum.
- d7965d65fc5b 19 (unreleased) landed
Trigger more frequent autovacuums with relallfrozen
- 06eae9e6218a 18.0 cited
Harden nbtree page deletion.
- c34787f91058 14.0 cited
Check for interrupts inside the nbtree page deletion code.
- 3a01f68e35a3 12.0 cited

On Thu, 30 Oct 2025 at 15:58, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
> In fact, with the introduction of the vacuum_max_eager_freeze_failure_rate feature, if a table’s age still exceeds more than 1.x times the autovacuum_freeze_max_age, it suggests that the vacuum freeze process is not functioning properly. Once the age surpasses vacuum_failsafe_age, wraparound issues are likely to occur soon.Taking the average of vacuum_failsafe_age and autovacuum_freeze_max_age is not a complex approach. Under the default configuration, this average already exceeds four times the autovacuum_freeze_max_age. At that stage, a DBA should have already intervened to investigate and resolve why the table age is not decreasing.

I don't think anyone would like to modify PostgreSQL in any way that
increases the chances that a table gets as old as vacuum_failsafe_age.
Regardless of the order in which tables are vacuumed, if a table gets
as old as that then vacuum is configured to run too slowly, or there
are not enough workers configured to cope with the given amount of
work. I think we need to tackle prioritisation and rate limiting as
two separate items. Nathan is proposing to improve the prioritisation
in this thread and it seems to me that your concerns are with rate
limiting. I've suggested an idea that might help with reducing the
cost_delay based on the score of the table in this thread. I'd rather
not introduce that as a topic for further discussion here (I imagine
Nathan agrees). It's not as if the server is going to consume 1
billion xids in 5 mins. It's at least going to take a day to days or
longer for that to happen and if autovacuum has not managed to get on
top of the workload in that time, then it's configured to run too
slowly and the cost_limit or delay needs to be adjusted.

My concern is that there are countless problems with autovacuum and if
you try and lump them all into a single thread to fix them all at
once, we'll get nowhere. Autovacuum was added to core in 8.1, 20 years
ago and I don't believe we've done anything to change the ratelimiting
aside from reducing the default cost_delay since then. It'd be good to
fix that at some point, just not here, please.

FWIW, I agree with Nathan about keeping the score calculation
non-magical. The score should be simple and easy to document. We can
introduce complexity to it as and when it's needed and when the
supporting evidence arrives, rather than from people waving their
hands.

David