Re: another autovacuum scheduling thread

David Rowley <dgrowleyml@gmail.com>

From: David Rowley <dgrowleyml@gmail.com>

To: Sami Imseih <samimseih@gmail.com>

Cc: Nathan Bossart <nathandbossart@gmail.com>, Robert Haas <robertmhaas@gmail.com>, Jeremy Schneider <schneider@ardentperf.com>, pgsql-hackers@postgresql.org

Date: 2025-10-23T22:39:55Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add rudimentary table prioritization to autovacuum.
- d7965d65fc5b 19 (unreleased) landed
Trigger more frequent autovacuums with relallfrozen
- 06eae9e6218a 18.0 cited
Harden nbtree page deletion.
- c34787f91058 14.0 cited
Check for interrupts inside the nbtree page deletion code.
- 3a01f68e35a3 12.0 cited

On Fri, 24 Oct 2025 at 09:48, Sami Imseih <samimseih@gmail.com> wrote:
> Yes, in my last reply, I did indicate that the sort will likely not be
> the operation that will tip the performance over, but the
> catalog scan itself that I have seen not scale well as the number of
> relations grow ( in cases of thousands or hundreds of thousands of tables).
> If we are to prioritize vacuuming by M(XID), then it will be hard to avoid the
> catalog scan anymore in a future improvement.

I grant you that I could see that could be a problem for a
sufficiently large number of tables and small enough
autovacuum_naptime, but I don't see how anything being proposed here
moves the goalposts on the requirements to scan pg_class. We at least
need to get the relopts from somewhere, plus reltuples, relpages,
relallfrozen. We can't magic those values out of thin air. So, since
nothing is changing in regards to the scan of pg_class or which
columns we need to look at in that table, I don't know why we'd
consider it a topic to discuss on this thread. If this thread becomes
a dumping ground for unrelated problems, then nothing will be done to
fix the problem at hand.

David