Re: another autovacuum scheduling thread
Robert Haas <robertmhaas@gmail.com>
From: Robert Haas <robertmhaas@gmail.com>
To: Nathan Bossart <nathandbossart@gmail.com>
Cc: Robert Treat <rob@xzilla.net>, David Rowley <dgrowleyml@gmail.com>, Sami Imseih <samimseih@gmail.com>,
Jeremy Schneider <schneider@ardentperf.com>, pgsql-hackers@postgresql.org
Date: 2025-11-20T14:30:42Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add rudimentary table prioritization to autovacuum.
- d7965d65fc5b 19 (unreleased) landed
-
Trigger more frequent autovacuums with relallfrozen
- 06eae9e6218a 18.0 cited
-
Harden nbtree page deletion.
- c34787f91058 14.0 cited
-
Check for interrupts inside the nbtree page deletion code.
- 3a01f68e35a3 12.0 cited
On Wed, Nov 12, 2025 at 3:10 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > I do think re-prioritization is worth considering, but IMHO we should leave > it out of phase 1. I think it's pretty easy to reason about one round of > prioritization being okay. The order is completely arbitrary today, so how > could ordering by vacuum-related criteria make things any worse? In my > view, changing the list contents in fancier ways (e.g., adding > just-processed tables back to the list) is a step further that requires > more discussion and testing. I agree with your view around reprioritization. To answer your rhetorical question, the way that reordering the list could hurt is if the current ordering (pg_class scan order) happened to be a near-optimal choice. For example, suppose the last table in pg_class order in a state where vacuuming appears to be necessary but will be painful and/or useless (VACUUM will error, xmin will prevent all or most tuple removal, located on an incredibly slow disk with nothing cached, whatever). Re-sorting the list figures to move that table earlier, which will not work out for the best. I suspect that reprioritization actually increases the danger of this kind of failure mode. The more aggressive you are about making sure that the highest-priority tables actually get handled first, the more important it is to be correct about the real order of priority. I do think in the long term a really good system is probably going to accumulate a bunch of extra logic to deal with cases like this. For example, if the first table in the queue causes VACUUM to spend an hour chugging a way and then fail with an I/O error, we would ideally want to make sure to wait a while before retrying that table, so that others don't get starved. But like you say, there's no need to solve every problem at once. What seems important to me for this patch is that we don't choose an actively bad sort order. For instance, if we don't get the balance between prioritizing anti-wraparound activity and controlling runaway bloat correct, and especially if there's no way to recover by tweaking settings, to me that's a scary scenario. I do think it's fairly realistic for a bad choice of sort order to end up being a regression over the current lack of a sort order. You might just be getting lucky right now -- say, because the catalog tables all occur first in the catalog and vacuuming those tends to be important, and among user tables, the ones you created first are actually the ones that are most important. That's not a particularly crazy scenario, IMHO. Point being: I think we need to avoid the mindset that we can't be stupider than we are now. I don't think there's any way we would commit something that is GENERALLY stupider than we are now, but it's not about averages. It's about whether there are specific cases that are common enough to worry about which end up getting regressed. I'm honestly not sure how much of a risk that is, and, again, I'm not trying to kill the patch. It might well be that the patch is already good enough that such scenarios will be extremely rare. However, it's easy to get overconfident when replacing a completely unintelligent system with a smarter one. The risk of something backfiring can sometimes be higher than one anticipates. One idea that might be worth considering is adding a reloption of some kind that lets the user exert positive control over the sort order. I know that's scope creep, so maybe it's a bad idea for that reason. But I think it would be a better idea than Sami's proposal to score system catalogs more highly, not so much because his idea is necessary wrong-headed as because it doesn't help with what I see as the principal danger here, namely, that whatever we do will sometimes turn out to be wrong. Trying to be right 100% of the time is not going to work out as well as having a backup plan for the cases where we are wrong. -- Robert Haas EDB: http://www.enterprisedb.com