Re: another autovacuum scheduling thread
David Rowley <dgrowleyml@gmail.com>
From: David Rowley <dgrowleyml@gmail.com>
To: Sami Imseih <samimseih@gmail.com>
Cc: Nathan Bossart <nathandbossart@gmail.com>,
Robert Haas <robertmhaas@gmail.com>, Jeremy Schneider <schneider@ardentperf.com>, pgsql-hackers@postgresql.org
Date: 2025-10-27T23:16:28Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add rudimentary table prioritization to autovacuum.
- d7965d65fc5b 19 (unreleased) landed
-
Trigger more frequent autovacuums with relallfrozen
- 06eae9e6218a 18.0 cited
-
Harden nbtree page deletion.
- c34787f91058 14.0 cited
-
Check for interrupts inside the nbtree page deletion code.
- 3a01f68e35a3 12.0 cited
On Tue, 28 Oct 2025 at 11:35, Sami Imseih <samimseih@gmail.com> wrote: > We discuss the threshold calculations in the documentation, and users > can write scripts to monitor which tables are eligible. However, there > is nothing that indicates which table autovacuum will work on next (I > have been asked that question by users a few times, sometimes out of > curiosity, or because they are monitoring vacuum activity and wondering > when their important table will get a vacuum cycle, or if they should > kick off a manual vacuum). With the scoring system, it will be much more > difficult to explain, unless someone walks through the code. I think it's reasonable to want to document how autovacuum prioritises tables, but maybe not in too much detail. Longer term, I think it would be good to have a pg_catalog view for this which showed the relid or schema/relname, and the output values of relation_needs_vacanalyze(). If we had that and we documented that autovacuum workers work from that list, but they just may have an older snapshot of it, then that might help make the score easier to document. It would also allow people to question the scores as I expect at least some people might not agree with the priorities. That would allow us to consider tuning the score calculation if someone points out a deficiency with the current calculation. Also, longer-term, it also doesn't seem that unreasonable that the autovacuum worker might want to refresh the tables_to_process once it finishes a table and if autovacuum_naptime * $value units of time have passed since it was last checked. That would allow the worker to deal with and react accordingly when scores have changed significantly since it last checked. I mean, it might be days between when autovacuum calculates the scores and finally vacuums the table when the list is long, of it it was tied up with large tables. Other workers may have gotten to some of the tables too, so the score may have dropped, but again made its way above the threshold, but to a lesser extent. David