Re: another autovacuum scheduling thread

Sami Imseih <samimseih@gmail.com>

From: Sami Imseih <samimseih@gmail.com>

To: Nathan Bossart <nathandbossart@gmail.com>

Cc: David Rowley <dgrowleyml@gmail.com>, Robert Haas <robertmhaas@gmail.com>, Jeremy Schneider <schneider@ardentperf.com>, pgsql-hackers@postgresql.org

Date: 2025-10-27T17:47:15Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add rudimentary table prioritization to autovacuum.
- d7965d65fc5b 19 (unreleased) landed
Trigger more frequent autovacuums with relallfrozen
- 06eae9e6218a 18.0 cited
Harden nbtree page deletion.
- c34787f91058 14.0 cited
Check for interrupts inside the nbtree page deletion code.
- 3a01f68e35a3 12.0 cited

I spent some time looking at this, and I am not sure how much this
will move the goalpost, since most of the time the bottleneck for
autovacuum is the limited number of workers and large tables that
take a long time to process.

That said, this is a good change for the simple reason that it is
better to have a well-defined prioritization strategy for autovacuum
than something that is somewhat random, as mentioned earlier.

Just a couple of comments on v5:

1/ Should we add documentation explaining this prioritization behavior in [0]?

I wrote a sql that returns the tables and scores, which I found was
useful when I was testing this out, so having the actually rules spelled out
in docs will actually be super useful.

If we don't want to go that much in depth, at minimum the docs should say:

"Autovacuum prioritizes tables based on how far they exceed their thresholds
or if they are approaching wraparound limits." so a DBA can understand
this behavior.

2/
* The score is calculated as the maximum of the ratios of each of the table's
* relevant values to its threshold. For example, if the number of inserted
* tuples is 100, and the insert threshold for the table is 80, the insert
* score is 1.25.

Should we consider clamping down on the score when
reltuples = -1, otherwise the scores for such tables ( new tables
with a large amount of ingested data ) will be over-inflated? Perhaps,
if reltuples = -1 ( # of reltuples not known ), then give a score of .5,
so we are not over-prioritizing but not pushing down to the bottom?

[0] https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM

--
Sami Imseih
Amazon Web Services